Subsections


Control Flow Graph

A control flow graph is a data structure built on top of the intermediate code representation (RTL instruction chain or trees) abstracting the control flow behavior of compiled function. It is an oriented graph where nodes are basic blocks and edges represent possible control flows from one basic block to another.

Basic Blocks

The basic block (node of control flow graph) is defined as a structure:

typedef struct basic_block_def {

  /* The first and last insns of the block.  */
  rtx head, end;

  /* The first and last trees of the block.  */
  tree head_tree;
  tree end_tree;

  /* The edges into and out of the block.  */
  edge pred, succ;

  /* Liveness info.  */

  /* The registers that are modified within
     this block.  */
  regset local_set;
  /* The registers that are conditionally
     modified within this block.
     In other words, registers that are set only
     as part of a COND_EXEC.  */

  regset cond_local_set;
  /* The registers that are live on entry to this block.  */
  regset global_live_at_start;
  /* The registers that are live on exit from this block.  */
  regset global_live_at_end;

  /* Auxiliary info specific to a pass.  */
  void *aux;

  /* The index of this block.  */
  int index;

  /* Loop info.  */

  /* The loop depth of this block.  */
  int loop_depth;

  /* Outermost loop containing the block.  */
  struct loop *loop_father;

  /* Immediate dominator.  */
  struct basic_block_def *dominator;

  /* Expected number of executions: calculated in profile.c.  */
  gcov_type count;

  /* Expected frequency.
     Normalized to be in range 0 to BB_FREQ_MAX.  */
  int frequency;

  /* Various flags.  See BB_* below.  */
  int flags;
}
Basic block is a straight-line sequence of code that can be entered only at the beginning and leaved only at the end. Basic blocks are represented using basic_block data type that contains pointers to the head (first instruction of the basic block) and the end (last instruction) as well as other information maintained about each block.

In the RTL function representation, the head pointer always points either to NOTE_INSN_BASIC_BLOCK or to CODE_LABEL, if present. Basic block ends by control flow instruction or last instruction before following CODE_LABEL.

Special basic blocks ENTRY_BLOCK_PTR and EXIT_BLOCK_PTR represent possible entry points and exits from the compiled function.

The BASIC_BLOCK array contains all basic blocks in the order they appear in the insn stream.

The RTL instruction stream contains not only the ``real'' instructions, but also notes. Notes may or may not appear inside basic block. Any function that moves or duplicates the basic blocks needs to take care of updating of these notes. Many of notes expect that code consists of linear regions making such updates difficult.

Additionally the jump table vectors are represented as ``instructions'' inside the insn chain. These vectors never appear in the basic block and should be always placed just after table jump instructions referencing them. After removing the table-jump it is difficult to eliminate the code computing address and referencing the vector, so cleaning up the vectors is postponed to liveness analysis and thus the vectors may appear in the insn chain without any purpose. Before any edge is made fall-thru, the existence of such construct in the way needs to be checked by calling can_fallthru function.

Edges

Edges in the control flow graphs are described by structure:

/* Control flow edge information.  */
typedef struct edge_def {
  /* Links through the predecessor and successor lists.  */
  struct edge_def *pred_next, *succ_next;

  /* The two blocks at the ends of the edge.  */
  struct basic_block_def *src, *dest;

  /* Instructions queued on the edge.  */
  rtx insns;

  /* Auxiliary info specific to a pass.  */
  void *aux;

  int flags;			/* see EDGE_* below  */
  int probability;		/* biased by REG_BR_PROB_BASE */
  gcov_type count;		/* Expected number of executions calculated
				   in profile.c  */
} *edge;

Edges represent possible control flow transfers from the end of basic block to the head of another. Single linked lists of edges to predecessors and successors are maintained for each basic block.

In common case edges correspond to branches or ``fall-thru'' edges to the following basic block, but there are several other reasons why edge may be created. The type of edge can be obtained via flags field.

There are following types:

jumps

Edges corresponding to destinations of jump instructions have no flags defined. These edges are used for unconditional or conditional jumps and table jumps and are most convenient to manipulate with as they may be freely redirected.

fall-thru

Fall-thru edges are present in case the basic block may continue execution to the following one without branching and do have EDGE_FALLTHRU flag set.

Unlike other types of edges, these edges must come into following basic block in the insn stream and thus function force_nonfallthru is available to insert jump in the case that redirection is needed. This may require creation of a new basic block.

exception handling

Exception handling edges represent possible control
transfers from trap to the exception handler. The definition of trap varies. In C++ only function calls can throw, but for Java exceptions like division by zero or segmentation fault are defined and thus each instruction possibly throwing this kind of exception needs to be handled as control flow instruction.

The destination of edge is specified by REG_EH_REGION note attached to the insn.

The flags of such notes set EDGE_EH and EDGE_ABNORMAL. In case of the call EDGE_ABNORMAL_CALL flag is set too.

When updating the insn stream it is easy to change possibly trapping instruction to non-trapping. Opposite conversion is difficult and should not happen. Predicate may_trap_p may be used to check whether instruction still may trap or not. The edges can be eliminated via purge_dead_edges call.

sibling calls

Sibling calls terminate the function in a non-standard way and thus an edge to the exit must be present. EDGE_ABNORMAL_CALL and EDGE_ABNORMAL are set in such case.

computed jumps

Computed jumps contain edges to all labels in the function referenced from the code. All those edges have EDGE_ABNORMAL flag set. The edges used to represent computed jumps often cause compile time performance problems, since functions consisting of many taken labels and many computed jumps may have very dense flow graphs, so these edges need to be handled with special care.

nonlocal goto handlers

GCC allows nested functions to return into caller using goto statement referring to label passed to as an argument. The labels passed to nested functions contain special code to cleanup after function call. Such section of code is referred as nonlocal goto receivers. In the case function contains such nonlocal goto receivers, the edge from the call to label is present having EDGE_ABNORMAL and EDGE_ABNORMAL_CALL flags set.

function entry points

By definition, execution of function starts by basic block 0, so there is always an edge from entry block to the first real basic block. The alternate entry points are specified by CODE_LABEL with LABEL_ALTERNATE_NAME defined. This feature is currently used for multiple entry point prologues and is limited to post-reload passes only. In future full support for multiple entry functions defined by Fortran 90 needs to be implemented.

function exits

In the pre-reload representation function terminates by the last instruction in the insn chain and no explicit return instructions are used. This corresponds to the fall-thru edge into exit block. After reload optimal RTL epilogues are used, that use explicit (conditional) return instructions that are represented by edges with no flags set.


The Profile

In many cases compiler must make choice whether to trade speed in one part of code for speed in another, or trade code size for code speed. In such cases it is useful to know information about how often given block executes and that is the purpose for maintaining profile within flow graph.

GCC allows the profile to be either feedback based or statically estimated.

The feedback based profile is produced by compiling the program with instrumentation, executing it on the train run and reading the numbers of executions of basic block edges back to the compiler while re-compiling the program to produce final executable. This method provides very accurate information about where program spends most of time on the train run. Whether it matches the average run depends, of course, on the choice of train data set, but several studies has shown, that the behavior of program usually changes just marginally over different data sets.

When profile feedback is not available, compiler attempts to predict the behavior of each branch in the program using a set of heuristics (see predict.def for details) and compute estimated frequencies of each basic block by propagating the probabilities over the graph.

Each basic block contains two fields -- frequency and count. Frequency is an estimation how often is basic block executed within a function and is represented as integer scaled in the range 0-BB_FREQ_BASE. Most frequent basic block in function is initially set to BB_FREQ_BASE and rest of frequencies are scaled according to that. During optimization, the frequency of most frequent basic block can both decrease (for instance by loop unrolling) or grow (for instance by cross-jumping optimization).

The count contains number of execution measured during training run and is nonzero only when profile feedback is available. This value is represented as 64bit integer. Most optimization passes can use only the frequencies of basic block, while few passes may want to know exact counts. The frequencies should always match the counts after scaling, however during updating of the profile information numerical error may accumulate into quite large errors.

Similarly each edge contains probability field--an integer in range from 0 to REG_BR_PROB_BASE. It represents probability of passing control from the end of source basic block to the destination. The probability that control flow arrives via given edge to the destination basic block is called reverse probability and is not directly represented, but it may be easily computed from frequencies of basic blocks. EDGE_FREQUENCY macro is available to compute how frequently is given edge taken. count field is present for each edge as well, representing same information as for basic block.

The basic block frequencies are not represented in the RTL instruction stream, the edge frequencies are represented only for conditional jump via REG_BR_PROB, since they are used when instructions are output to the assembly file and flow graph is no longer maintained.

Structure of profile output file

The name of the file corresponding to a source file is formed simply by changing file suffix to .da, e.g. for test.c the corresponding file is test.da.

As mentioned before, the file consists of several sections, each represening single run. The structure of each section is shown on figure [*]

Figure: Structure of a section in profile data file
\scalebox{1.0}{
\includegraphics{profile_record.ps}}

The function name in the figure is stored as a $ -1$ (4 bytes), the length (4 bytes), the name itself (padded to 4-byte boundary) followed by a $ -1$ (4 bytes).

The extension block is used for storage of other important data which may be emited by future versions of gcc. This allows use of particular profile with different versions of gcc.

In current version, extension block contains the following information:

  1. number of instrumented arcs in whole program (4-byte number)
  2. sum all of instrumented arcs in whole program (8-byte number)
  3. maximal value of counter in whole program (8-byte number)
  4. number of instrumented arcs in the object file (4-byte number)
  5. sum all of instrumented arcs in the object file (8-byte number)
  6. maximal value of counter in the object file (8-byte number)

The content of the file can be examined by utility gcov, which outputs the corresponding source file together with execution counts for each line (so-called line coverage). There is currently no utility for manipulating the profile output file structure, e.g. removing runs or merging two files together.

Maintaining CFG up to Date

Important task is to keep both control flow graph and profile up-to-date with the instruction stream during optimization passes. Reconstruction of control flow graph after each pass is not an option, as it is too expensive and we lose profile information.

At the moment, the basic block boundaries are maintained transparently during emitting instruction, so rarely there is need to move them manually (such as in case someone wants to output instruction outside basic block explicitly). Each instruction has defined BLOCK_FOR_INSN value that represents pointer to the basic block owning it, so the basic block list may be better viewed as integral part of instruction chain, than structure built on the top of it.

Updating of edges is not transparent and optimization pass is required to do that manually. However only few cases occur in practice. Commonly the optimization pass simplifies the instruction possibly eliminating some edge. This may happen by simplifying the conditional jump into unconditional, but also by simplifying possibly trapping instruction to non-trapping while compiling Java. The pass may call purge_dead_edges on given basic block to remove unneeded edges, if any.

Other common scenario is redirection of branch instructions, but this is best modeled as redirection of edges in the control flow graph and thus use of redirect_edge_and_branch is preferred over more low level functions, such as redirect_jump that operate on RTL chain only.

Last common case is inserting control flow instruction into middle of basic block. The find_sub_basic_blocks may be used to split existing basic block and add necessary edges, or split_block may be used when instruction in middle of basic block wants to become target of branch instruction.

For global optimizer, a common operation is to split edges in the flow graph and insert instruction to them. This can be easily done via function commit_insn_to_edge that emits instruction ``to the edge'' caching it for later commit_edge_insertions call that will care creation of new basic block where needed and flushing the instruction to actual instruction stream.

While debugging the optimization pass, an verify_flow_info function may be useful to find bugs in the control flow graph updating code.

Maintaining Profile up to Date

More delicate task than updating control flow graph is to update profile. Many of the function to modify flow graph, like redirect_edge_and_branch do not have enough information to easily update profile, so updating profile is in majority cases left on the caller. Since it is difficult to discover bugs in the profile updating code, as they manifest themselves only by producing worse code and checking profile consistency is not possible, because of numeric error accumulation, special care needs to be taken into this issue.

It is important to point out, that REG_BR_PROB_BASE and BB_FREQ_BASE are both set low enough to be possible to compute second power of any frequency or probability in the flow graph, it is not possible to even square the count field, as modern CPUs are fast enough to execute $ 2^32$ operations quickly.

Liveness Information

Liveness information is useful to determine whether register X is ``live'' at given point of program, that means that it contains important value. This information is used, for instance, during register allocation pass, as the pseudo registers need to be assigned to unique hard register or stack slot only when they are live. The hard registers and stack slots may be freely reused for other values when they are dead.

The liveness information is stored partly in the RTL instruction chain and partly in the flow graph. RTL chain stores local information: each instruction may contain REG_DEAD note representing that value of given register is no longer needed or REG_UNUSED note representing that the value computed by instruction is never used. The second is useful for instructions computing multiple values at once.

Each basic block contains bitmaps representing liveness of each register at entry and exit of basic block (global_live_at_start and global_live_at_end). flow.c contains function to compute liveness of each register at any given place in the instruction stream using this information.

Liveness is expensive to compute and thus it is desirable to keep it up to date during optimization passes. This can be easily accomplished using flags field of basic block. The functions modifying instruction stream automatically set BB_DIRTY flag of basic block, so the pass may simply use clear_bb_for_blocks before doing any modifications and then ask dataflow modulule via function update_life_info_in_dirty_blocks to get liveness updated.

This scheme works reliably as long as no control flow graph transformations are done. The task of updating liveness after control flow graph changes is more difficult as normal iterative data flow may produce invalid results or get into cycle when the initial solution is not bellow the desired one. Only simple transformations, like splitting basic blocks or emitting to the edge are safe, as functions to implement them already know how to update liveness locally.

Loop Tree

Loop tree describes the structure of loops. Nodes correspond to loops, while edges reflect the subloop/superloop stucture. Root of the tree is a dummy loop containing the whole function (including ENTRY_BLOCK_PTR as latch and EXIT_BLOCK_PTR as header).

Struct loop is defined as follows:

struct loop
{
  /* Index into loops array.  */
  int num;

  /* Basic block of loop header.  */
  basic_block header;

  /* Basic block of loop latch.  */
  basic_block latch;

  /* Number of blocks contained within the loop.  */
  int num_nodes;

  /* The loop nesting depth.  */
  int depth;

  /* Array of superloops of the loop.  */
  struct loop **pred;

  /* The outer (parent) loop or NULL if outermost loop.  */
  struct loop *outer;

  /* The first inner (child) loop or NULL if innermost loop.  */
  struct loop *inner;

  /* Link to the next (sibling) loop.  */
  struct loop *next;

  /* Loop that is copy of this loop
     (must be valid only just after copying the loop).  */
  struct loop *copy;

 ... /* Fields specific for the old loop optimizer
        are omitted.  */
};

The whole loop tree (plus additional information) is stored in struct loops:

struct loops
 {
  /* Number of natural loops in the function.  */
  int num;

  /* Array of pointers to loops.  Subloops should always
     have greater index than their parents.  */
  struct loop **parray;

  /* Pointer to root of loop hierarchy tree.  */
  struct loop *tree_root;

 ... /* Fields specific for old loop optimizer
        are omitted.  */
 };

Field loop_father of basic_block is a pointer to the innermost loop containing the basic block.

Functions for building, modifying and querying the loop structure are provided in cfgloop.c and cfgloopanal.c; some higher level functions for manipulating it may be found in loop-new.c.

Jan Hubicka 2003-05-04