User Visible Changes

We did minimal changes to GCC from user point of view as we use existing command line options for profiling and feedback based compilation. To profile program, compile program with option -fprofile-arcs (for the current GCC snapshots you may need -static as well3.1) and run it on the train inputs. Once program is profiled, optimized binary can be produced by compilation with option -fbranch-probabilities. Other compiler options must be the same in both passes. Don't forged to remove the profile files (*.da) afterwards to avoid them being merged to future profiles of the program.

Unlike previous versions of GCC, -fbranch-probabilities has positive effect on generated code3.2. We tested our work on i386/GNU/Linux platform, but the majority of platforms should be functional in 3.1 and 3.2 development trees. In cfg-branch tree we briefly tested Power-PC and Sparc and found it functional.

On CFG-branch we added following command line options to control new optimization passes:

-fweb
Enables webizer pass. Enabled by default at -O2 level of optimization.

Webizer pass improves register allocation and common subexpression elimination in cases where single variable is used in multiple contexts, like i as counter in multiple loops.

-ftracer
Enables tracer pass. Enabled by default at -O2 level of optimization if profile feedback is present.

Tracer performs code duplication in order to help other optimizers. The resulting code is larger, but should run faster unless code cache limits are hit.

-freorder-blocks
Enables software trace cache pass. Enabled by default at -O2. This option existed in GCC since version 3.0.0 and enabled old branch reordering pass.

Basic block reordering reduces amount of taken conditional jumps in code resulting in better instruction decoder performance and smaller code cache footprint.

-freorder-functions
Enables software trace cache pass. Enabled by default at -O2.

Function reordering further improves code locality and avoids code cache conflicts. This option has effect only when profile feedback is available and target assembler supports named sections.

-fnew-unroll-loops
Enables unrolling of simple loops. Enabled by default at -O3 level of optimization.

Function unrolling duplicates loop body several times to improve other optimizations and instruction decoder performance. By default unrolling is done only for loop where iteration counter can be identified. -param max-unrolled-insns=$ n$ and -param max-unroll-times=$ n$ may be used to control amount of unrolling. First option specifies the number of instructions in the loop body unroller is attempting to reach, while the second limits the number of copies of the loop body done.

-fnew-unroll-all-loops
Enables unrolling of all loops. Enabled by default at -O3 level of optimization.

Same as -fnew-unroll-loops, but all loops are unrolled.

-fpeel-loops
Enables loop peeling. Enabled by default at -O3 level of optimization.

Function peeling duplicates a loop body in the front of loop itself. For loops with small average iterations counts it can effectively avoid the loop. Peeled loop body can also be better optimized by other optimization passes and scheduled into the code just before loop.

Again -param max-peeled-insns=$ n$ and -param max-peel-times=$ n$ options can be used with analogous meaning to the -fnew-unroll-loops parameters.

-funswitch-loops
Enables loop unswitching. Enabled by default at -O3 level of optimization.

Loop unswitching avoids invariant conditionals in the body of loop by duplicating the loop body and moving the conditional into the header. This usually results in better performance and larger code size.

-fmidlevel-rtl
Enables midlevel RTL. Enabled by default for i386 architecture, disabled otherwise.

Midlevel RTL is an alternate intermediate code representation in GCC that may be used for more aggressive optimizations. For some targets, midlevel RTL is required for loop unswitching and loop unrolling to be effective.

-funsafe-profile-arcs
Disables thread safe profiling.

-fvar-tracking
Enables accurate debug output to be generated. No released version of GDB is able to read the data yet, so it is desirable to disable it when you use GDB 5.2.x or older.

This option is on by default.

Effect of all options can be negated using -fno- prefix.

We have added an attribute noprofile for disabling profiling (see [1] for details).

Jan Hubicka 2003-05-04