Subsections

Experimental Results

We present benchmarks of majority optimizations discussed. We also present the same benchmarks performed on IA-32 and Alpha system where possible to give an comparison of effectivity of individual optimizations on these architectures. We hope this to be useful to apply earlier published results on compiler optimization (such as [FDO]) to the new platform and give a guide of what optimizations are most important. We also present results with two different optimization levels -- the standard optimization (-O2) used by the majority of distributions today and aggressive optimization (-O3 -ftracer -funroll-loops -funit-at-a-time with profile feedback) we found to give best overall SPEC score.

We did use modified prerelease of GCC 3.3 as used by SuSE Linux 8.2 for AMD64. All the runs were performed on SuSE Linux on dedicated machines, however important amount of random noise remains (especially for benchmarks Mesa, Gzip, Perl and Twolf). Due to time limitations the benchmarks were performed with one iteration only except for the benchmarks in the Table [*] and [*] that were computed with 3 iterations. Because the runs were not done on final hardware and because we didn't satisfy the conditions for reportable runs in all tests, we present relative numbers only.

Each table is divided into two sections -- first part includes optimizations enabled by default at given optimization level, while the other part contains optimization that user needs to enable by hand either because they are ineffective, inappropriate for given settings or does not obey the language standards. Each table also contains comparison of two runs with equal settings in the first line to present rough approximation of the noise in the numbers. Both performance and sizes of the stripped binaries are presented. The numbers always represent relative speedup (or code size increase) from the run with the specified feature disabled to the run with specified feature enabled. For instance -fomit-frame-pointer run in the table [*] compare performance of -O2 -fno-omit-frame-pointer to -O2 -fomit-frame-pointer. The benchmark ``standard optimization'' compare -O0 to -O2.

The Following benchmarks were performed:

aggressive optimization
compare performance of unoptimized code (-O0) to the aggressive optimization settings described above.
all prologue using move
eliminate use of all push and pop operations in the prologues and epilogues except for cases where single register is saved. See Section 2.2.
-fasynchronous-unwind-tables
enable production of DWARF2 unwind information. See Section 2.3.
-fbranch-probabilities
enable profile feedback based optimizations. We implemented majority of transformations described on [FDO] with the exception of function in-lining and switch statement expansion.
-fgcse
enable global optimizers including (limited form of) partial redundancy elimination, load motion, constant propagation and copy propagation. GCC does contain loop invariant hoisting and extended basic block based value numbering pass making the global optimizers partly redundant.
-fguess-branch-probability
enable optimizations driven by static profile estimation. The profile is estimated by methods based on [profile] when profile feedback is not available.
-finline-functions
enable function in-lining.
-fold-unroll-loops
enable old loop unroller that actually unrolls some loops on Alpha.
-fomit-frame-pointer
enable elimination of frame pointer by using stack pointer instead. See Section 2.2.
-foptimize-sibling-calls
transform call to leaf function into jump.
-fpeel-loops
enable loop peeling.
-fpic
produce position independent code. See Section 2.7.
-freorder-blocks
enable intra-function basic block reordering and duplication based on significantly modified software trace cache algorithm [STC].
-fschedule-insns2
enable post-register allocation local scheduling. See Section 3.3.
-fschedule-insns
enable pre-register allocation region scheduling (not available for IA-32 and AMD64).
-fstrength-reduce
enable strength reduction.
-fstrict-aliasing
enable ANSI-C type based aliasing.
full sized loads and moves
avoids use of instructions initializing just portion of the destination registers. See Section 3.2 and 3.1.
-ftracer
enable super-block formation using algorithm similar to [FDO]. The super-blocks are unified again after optimizations by cross-jumping pass so this transformation is not used to improve scheduling as commonly described in the literature. It is aimed to improve CSE and other transformation by simplifying the control flow.
-funit-at-a-time
enable optimizations on whole compilation unit. At the moment GCC perform stronger function in-lining (in-lining of small functions called before defined and static functions called once) and use register calling conventions for static functions on IA-32. Only effective for C compiler.
-funroll-all-loops
enable loop unrolling of all small enough loops in the hot spots.
-funroll-loops
enable loop unrolling for loops with known induction variable. While working on the paper we noticed that our new implementation has important flaw avoiding loops from being unrolled on Alpha architecture.
-m64
enable 64-bit code generation (used in comparisons relative to IA-32 code).
-mfpmath=sse
eliminate use SSE(2) instruction set for scalar floating point calculations.
-mcmodel
controls code and data segment size limits. See Section 2.7.
-mred-zone
enable use of 128 bytes below stack pointer for local data. See Section 2.2.
partial SSE moves
eliminate use of movlpd for double precision loads and movsd for register to register moves. See Section 3.2.
prologue using move
eliminate use of hot push and pop operations in the prologues and epilogues. See Section 2.2.
standard optimization
compare performance of unoptimized code (-O0) to the standard optimization settings (-O2).

Table 1: Compilation Time Cost (AMD Opteron)
options slowdown
  0.00%
-fstrict-aliasing -1.13%
-fasynchronous-unwind-tables -0.38%
-freorder-blocks 0.00%
-fomit-frame-pointer 0.37%
-mred-zone 0.38%
-mfpmath=sse 0.75%
-maccumulate-outgoing-args 0.75%
-foptimize-sibling-calls 0.76%
-fguess-branch-probabilities 1.54%
-fschedule-insns2 2.33%
-fgcse 6.88%
-ffast-math -1.88%
-ftracer 0.00%
-frename-registers 0.74%
-funroll-loops 3.38%
-fpic 3.39%
-funroll-all-loops 5.32%
-mcmodel=medium 2.27%
-fbranch-probabilities 142.74%


Performance (relative speedups in percents):
Table: 64-bit SPECint 2000 with Standard Optimization (AMD Opteron)
options gzip vpr gcc mcf crafty parser eon perl gap vortex bzip2 twolf avg
  1.32 0.14 -0.45 -0.45 -0.17 0.19 0.41 0.11 0.60 0.28 0.27 -0.54 0.13
standard optimization 105.37 82.29 90.55 12.06 87.14 58.23 451.70 97.05 101.18 75.30 142.14 55.99 93.40
-fguess-branch 4.40 4.45 2.90 0.00 2.73 0.19 5.58 5.96 7.43 21.60 2.56 -1.46 4.10
probabilities                          
-fschedule-insns2 1.62 1.44 2.40 0.22 0.32 0.78 4.90 1.28 -0.45 4.34 0.41 0.93 1.46
-fstrict-aliasing 1.48 4.62 1.93 0.00 3.68 0.58 -2.34 1.75 0.75 4.27 4.79 -2.34 1.19
-mfpmath=sse 1.93 3.98 -0.23 0.00 -0.09 -0.39 2.11 0.00 1.81 3.94 0.27 0.80 1.06
prologue using move -0.74 0.14 0.34 0.00 4.04 0.98 -0.43 1.43 0.30 5.71 -0.28 0.13 0.93
full sized loads and moves -1.76 -0.29 -0.46 0.88 0.96 -0.20 24.90 -1.52 -0.45 -1.04 0.97 -3.61 0.93
-fgcse 1.17 4.28 -1.77 1.35 0.48 1.38 2.33 1.75 -1.48 1.55 1.26 0.13 0.92
-foptimize 1.62 0.43 -0.12 0.00 3.33 0.00 2.33 -0.35 1.51 2.44 0.27 0.26 0.92
sibling-calls                          
-finline-functions 1.62 0.71 0.22 1.11 0.32 3.08 0.30 -1.04 0.58 -0.99 2.21 0.67 0.65
-fomit-frame-pointer 0.29 1.58 0.56 0.67 5.00 1.57 -3.03 3.07 -0.60 0.47 2.41 -3.48 0.39
-freorder-blocks 3.61 -0.29 -0.57 0.22 2.31 -0.78 0.72 4.06 0.75 3.45 1.84 -5.31 0.39
-maccumulate- 1.92 -0.58 0.78 0.45 0.24 -0.39 1.04 -0.12 -0.60 -1.13 0.13 0.80 0.26
outgoing-args                          
-mred-zone 1.47 0.14 1.35 -0.23 1.30 -0.20 -1.73 0.00 -0.30 -0.29 0.55 -0.67 0.13
partial SSE moves -0.30 5.89 -0.92 0.00 0.07 0.00 -1.17 0.00 0.00 -0.10 -0.14 -3.36 -0.27
aggressive optimization 6.34 4.97 8.81 0.67 1.29 25.43 24.14 12.29 7.51 5.69 5.42 4.65 8.40
-fbranch-probabilities 5.95 1.71 7.13 0.22 -0.65 16.76 2.98 3.90 0.14 6.95 0.27 3.73 4.07
-funroll-all-loops 4.16 0.42 5.60 0.00 -4.28 0.77 16.42 4.02 1.35 0.57 1.82 1.46 2.50
-funroll-loops 3.71 0.28 4.17 0.00 0.08 0.58 15.35 1.61 1.35 -4.78 0.55 3.32 2.23
all prologue using move -0.60 0.56 2.38 -0.23 -0.40 0.58 3.73 3.19 -0.15 -4.29 0.55 4.68 1.05
-ffast-math 1.78 0.28 0.67 0.00 -0.25 -0.20 0.31 -0.81 0.15 2.67 1.12 2.64 0.78
-frename-registers -0.15 0.56 -0.68 0.00 0.08 0.58 1.34 -2.19 -0.76 -1.25 0.97 4.92 0.65
-funit-at-a-time 0.89 2.71 0.79 0.45 0.72 0.38 0.00 -0.47 -0.45 0.68 0.69 -0.93 0.39
-ftracer 3.12 0.14 1.57 0.00 1.13 -0.20 1.76 0.91 -7.81 -3.83 1.40 2.40 0.13
-cmodel=medium -4.30 -1.00 -0.45 0.00 -10.84 0.00 2.18 -3.57 -5.83 -6.27 -2.23 -0.27 -2.51
-fpic -9.11 -1.72 -1.68 0.89 -18.21 -0.78 -1.36 -16.79 -3.76 -15.16 -6.18 -1.48 -6.20
File size (relative increase of the size of stripped binaries in percents):
options gzip vpr gcc mcf crafty parser eon perl gap vortex bzip2 twolf total
standard optimization -11.24 -23.04 -23.74 -20.59 -17.13 -13.77 -13.71 -20.00 -36.54 -9.42 -15.83 -39.29 -22.31
-maccumulate- -0.42 -4.02 -3.47 -3.34 -0.35 -3.30 -3.15 -3.29 -4.31 -3.60 5.16 -2.51 -3.25
outgoing-args                          
-fomit-frame-pointer -0.26 1.72 -1.13 -0.20 0.04 -3.76 -1.94 -1.24 -1.07 2.08 -0.08 -0.99 -0.71
-fstrict-aliasing 0.00 -0.68 -0.15 0.00 0.00 0.00 0.22 0.00 -0.34 -0.66 0.00 -5.02 -0.40
-mred-zone 0.00 -0.11 -0.19 0.00 -0.02 0.00 -0.76 0.59 -0.02 0.00 0.00 -0.04 -0.09
-fschedule-insns2 0.00 0.02 -0.15 0.00 0.01 0.00 0.02 0.00 0.00 0.02 0.00 -0.07 -0.05
-fgcse -0.11 0.04 -0.16 0.19 0.03 0.11 0.44 0.68 -0.01 -0.68 0.00 -1.16 -0.05
-foptimize 0.00 -0.03 0.08 0.00 -0.02 0.00 -0.76 0.48 -0.16 -0.01 -0.23 -0.10 -0.03
sibling-calls                          
partial SSE moves 0.00 0.00 0.00 0.00 0.00 0.00 0.16 0.00 0.00 0.00 0.00 0.01 0.02
full sized loads and moves 0.00 0.00 0.04 0.00 1.21 0.00 0.00 0.00 0.08 -0.01 0.00 0.11 0.08
-mfpmath=sse 0.00 -0.64 -0.15 0.00 0.00 0.00 2.34 -0.01 0.00 0.00 0.00 -1.64 0.13
prologue using move -0.11 1.06 1.01 0.00 1.26 -0.34 0.91 0.84 1.44 2.55 0.00 0.16 1.14
-freorder-blocks 7.06 2.71 4.43 0.00 4.05 3.67 1.07 5.72 3.42 5.60 10.89 4.22 4.19
-finline-functions -0.73 1.15 8.85 -0.20 0.24 28.60 0.12 6.55 3.37 1.99 29.84 0.68 5.49
-fguess-branch 7.00 4.41 5.82 0.00 3.60 3.34 2.64 6.67 5.85 8.74 10.89 3.97 5.66
probabilities                          
-fasynchronous 7.12 10.28 7.38 6.31 3.76 17.16 4.83 9.26 9.04 7.88 18.14 5.34 7.71
unwind-tables                          
-fbranch-probabilities -4.91 -2.07 -2.20 0.82 0.11 0.02 -2.44 -3.92 -3.74 -4.72 -7.30 -1.80 -2.85
-funit-at-a-time -22.64 -4.95 -1.50 0.00 0.00 0.00 0.00 -0.82 -0.08 -0.01 0.00 -0.10 -1.09
-ffast-math 0.00 -0.03 0.00 0.00 0.00 0.00 0.00 -0.68 0.00 -0.02 0.00 0.01 -0.09
-frename-registers 0.00 0.26 0.97 0.00 0.28 0.00 1.99 0.68 0.24 0.04 0.00 1.83 0.78
all prologue using move -0.73 4.14 1.14 -0.96 -0.33 2.18 1.35 0.87 1.60 0.52 -0.77 2.38 1.17
-ftracer 0.00 1.27 1.29 0.00 0.13 0.00 2.50 2.01 2.46 1.31 0.00 1.54 1.56
-funroll-loops 13.30 7.92 3.18 1.34 4.22 7.11 1.26 2.70 12.57 0.02 9.82 8.70 4.21
-funroll-all-loops 13.30 9.53 4.29 24.50 4.71 14.20 1.43 3.38 15.76 0.66 9.82 14.40 5.71
-fpic 12.11 6.53 3.62 1.14 21.40 9.38 1.92 6.48 15.53 9.16 7.06 16.66 7.55
-mcmodel=medium 13.62 8.10 7.10 0.00 17.57 7.44 6.35 8.29 8.35 6.64 9.90 13.33 8.09
aggressive optimization -14.42 4.03 21.89 5.12 6.44 44.45 -0.47 8.80 7.38 0.73 40.05 3.93 11.08


Performance (relative speedups in percents):
Table: 64-bit SPECfp 2000 with Standard Optimization (AMD Opteron)
options wupwise swim mgrid applu mesa art equake ammp sixtrack apsi avg
  -0.28 -0.13 0.00 0.00 0.23 -2.07 0.14 0.00 0.00 0.00 -0.16
standard optimization 102.22 54.49 633.14 220.37 79.20 22.69 90.76 111.08 204.34 192.64 142.52
-mfpmath=sse 9.30 0.12 3.31 2.38 11.68 102.55 0.28 8.32 11.53 6.01 12.43
-fguess-branch- 7.62 0.00 6.42 2.78 7.48 0.42 -2.23 -1.27 -0.29 4.72 2.75
probabilities                      
partial SSE moves 2.86 0.13 2.95 3.21 3.34 -3.26 0.86 3.11 3.86 3.33 2.12
full sized loads and moves 2.13 0.26 1.35 1.98 6.38 0.69 0.00 2.00 1.45 1.55 1.78
-fstrict-aliasing 0.00 0.12 0.00 0.19 2.22 5.22 -2.23 0.90 0.00 5.08 1.44
-fschedule-insns2 2.23 0.00 7.72 0.78 0.34 -1.40 -2.50 0.90 4.50 1.01 1.28
-freorder-blocks 0.97 0.12 0.18 0.19 13.09 2.28 0.28 0.00 -1.42 0.00 1.28
-fomit-frame-pointer 2.51 0.00 4.53 0.38 -0.58 -1.80 -1.13 0.90 -0.29 3.63 0.95
prologue using move -3.24 0.00 0.00 0.00 3.58 0.69 0.00 -0.14 0.00 0.00 0.15
-finline-functions 0.13 0.12 0.00 0.19 1.85 -1.51 1.84 -0.52 0.28 -0.17 0.15
-foptimize 0.82 0.12 0.18 0.19 -0.46 -0.97 0.00 0.12 0.00 0.00 0.00
sibling-calls                      
-mred-zone 0.00 0.00 0.00 0.38 0.57 0.97 -2.10 -0.26 0.00 0.16 0.00
-maccumulate- 0.55 -0.13 0.18 0.00 0.45 -3.46 0.00 0.00 -0.29 0.33 -0.16
outgoing-args                      
-fgcse 1.37 0.00 -7.19 -5.15 -0.23 0.69 0.42 -0.64 -4.14 -2.13 -1.71
aggressive optimization 5.57 -0.91 6.60 4.26 4.14 -1.93 7.96 3.58 10.63 -2.34 3.15
-funroll-all-loops 2.72 -0.13 1.88 2.32 -1.50 5.58 0.42 3.58 -0.29 1.16 1.58
-funroll-loops 2.72 0.00 1.88 2.51 -0.92 2.67 2.13 3.58 -0.29 1.16 1.57
-ffast-math 0.81 0.00 0.00 2.13 1.26 -3.16 0.99 4.74 0.57 1.50 0.94
all prologue using move 4.18 0.00 -0.39 0.19 0.23 -0.98 1.86 -0.27 1.14 0.34 0.63
-fbranch-probabilities -3.44 0.12 -0.94 0.38 15.14 -1.40 -0.15 -0.65 0.85 -3.35 0.15
-funit-at-a-time 0.13 0.12 -0.19 0.00 3.93 -3.54 0.14 0.12 0.00 -0.17 0.15
-frename-registers -3.54 -0.26 5.66 -0.39 -7.23 -1.11 4.97 3.46 0.86 -0.34 0.15
-ftracer -0.82 0.00 0.00 0.00 -2.87 -2.35 -0.15 0.77 0.86 -0.67 -0.64
-cmodel=medium 2.73 -0.26 -0.19 -0.39 -3.69 -0.83 -0.72 -1.03 -14.95 -0.17 -1.90
-fpic 0.95 0.00 0.37 -0.97 1.72 -0.29 0.71 -0.13 -20.98 -0.17 -1.90
File size (relative increase of the size of stripped binaries in percents):
options wupwise swim mgrid applu mesa art equake ammp sixtrack apsi total
standard optimization -25.71 -26.52 -36.03 -60.14 -34.62 -15.82 -33.14 -32.33 -38.32 -30.33 -36.85
-maccumulate- -1.63 -0.71 -1.83 -0.71 -3.40 -2.07 -1.80 -2.77 -1.12 -1.17 -1.89
outgoing-args                      
-fschedule-insns2 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.02 -0.43 0.00 -0.21
-mred-zone 0.00 0.00 -0.19 -2.31 -0.13 -0.08 -0.14 -0.12 -0.03 -0.12 -0.14
-fgcse 0.00 -8.64 -4.00 -10.19 -0.74 1.91 -0.38 0.00 1.70 -3.61 -0.07
-fstrict-aliasing 0.00 0.00 0.00 0.00 -0.13 0.07 0.00 -0.05 0.00 0.00 -0.04
-foptimize 0.00 0.00 0.00 0.00 -0.24 0.00 0.00 0.04 -0.02 0.68 -0.02
sibling-calls                      
full sized loads and moves 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.00 0.75 0.08
-fomit-frame-pointer 0.00 0.47 0.75 -1.97 -0.05 0.39 -0.14 0.37 0.12 5.74 0.43
partial SSE moves 0.00 0.23 0.00 0.71 0.79 0.00 0.00 0.24 0.43 0.89 0.53
prologue using move -0.28 0.00 0.00 0.11 1.78 0.00 0.00 0.26 -0.02 0.70 0.53
-freorder-blocks 0.00 0.47 0.00 0.11 2.44 0.00 0.00 2.62 0.86 1.37 1.38
-mfpmath=sse 0.00 2.16 0.00 6.26 -1.57 0.00 -0.14 3.19 2.65 4.39 1.60
-fguess-branch -0.28 1.43 0.00 -0.36 5.10 12.16 10.56 3.04 0.41 1.19 2.09
probabilities                      
-finline-functions 0.00 0.00 0.00 0.00 5.39 19.96 0.13 0.42 1.29 1.50 2.45
-fasynchronous- 9.34 3.15 6.75 1.92 10.46 16.55 13.01 6.21 1.25 3.83 4.67
unwind-info                      
-fbranch-probabilities 0.64 0.15 0.76 0.19 -5.23 0.70 0.61 -2.11 -0.28 -0.06 -1.58
-ffast-math 0.00 -0.95 0.00 0.58 -0.83 -13.04 -0.27 -5.57 0.86 0.00 -0.35
-funit-at-a-time 0.00 0.00 0.00 0.00 -0.07 0.00 0.00 -0.03 0.00 0.00 -0.03
all prologue using move -0.28 1.40 0.37 1.29 0.78 -1.02 -0.40 2.26 0.61 1.96 0.86
-ftracer 0.00 0.00 0.00 0.00 2.37 0.07 0.00 5.45 0.43 3.35 1.51
-frename-registers 0.00 0.47 0.00 2.65 1.78 0.00 0.00 2.60 2.58 0.86 2.10
-funroll-loops 1.93 24.69 6.32 6.42 7.95 20.05 0.65 11.14 3.02 6.63 5.63
-funroll-all-loops 1.93 24.69 7.25 6.42 8.19 20.05 2.35 11.14 3.02 6.63 5.73
-fpic 0.45 0.23 0.93 2.24 5.92 9.28 7.71 4.91 8.04 3.75 6.51
-mcmodel=medium 0.09 4.93 0.00 7.49 3.53 0.85 1.83 5.45 24.62 6.36 14.32
aggressive optimization 71.81 164.20 125.37 57.30 11.28 97.53 52.54 12.91 26.21 34.10 26.45


Performance (relative speedups in percents):
Table: 64-bit SPECint 2000 with Aggressive Optimization (AMD Opteron)
options gzip vpr gcc mcf crafty parser eon perl gap vortex bzip2 twolf avg
  -0.28 -0.41 0.20 -0.45 0.00 -0.16 0.00 -0.11 0.84 0.00 0.13 0.38 0.12
aggressive optimization 112.35 91.73 103.60 14.72 86.01 97.56 589.65 130.46 111.79 74.46 151.98 56.79 106.81
-fbranch-probabilities 8.40 2.62 10.71 0.22 3.38 21.72 27.67 27.67 14.24 10.37 4.39 -1.56 9.49
-fguess-branch...                          
full sized loads and moves 1.00 0.67 -0.53 0.00 0.97 -0.48 56.39 1.79 0.71 0.62 0.13 4.64 4.61
-fbranch-probabilities 2.69 0.00 5.62 -0.45 2.62 19.85 -0.92 11.94 4.06 2.29 1.07 0.51 3.77
-m64 9.90 0.27 3.39 -22.19 42.29 -2.13 45.66 0.30 -1.25 6.29 8.28 -13.33 3.38
-funroll-loops 1.69 0.54 0.41 0.22 0.88 1.41 16.94 7.59 0.56 1.73 0.93 4.62 3.12
-freorder-blocks 4.95 1.22 4.51 0.22 3.89 1.89 2.40 13.06 -0.56 -1.42 0.40 1.15 2.48
-fomit-frame-pointer 0.13 0.00 2.19 0.44 2.03 1.73 2.31 5.38 -0.28 1.08 1.47 5.05 2.10
-fstrict-aliasing -0.56 4.80 0.82 0.44 1.04 1.89 1.61 2.08 1.72 1.64 5.88 1.15 1.85
-finline-functions -0.42 0.54 1.55 2.02 1.86 5.21 1.01 -0.31 0.42 3.62 3.13 2.75 1.85
-ftracer -0.69 -0.27 0.30 0.00 1.12 0.78 5.20 3.93 0.14 0.27 0.53 4.90 1.60
-fschedule-insns2 0.27 2.62 0.41 0.22 4.24 0.46 2.57 1.55 0.99 3.34 1.61 0.64 1.47
-mred-zone -0.42 0.13 0.61 0.66 0.96 0.31 -1.33 1.56 -0.56 7.01 -0.14 3.56 1.22
-fgcse 2.70 4.06 1.14 -0.23 3.47 -0.77 -0.51 -0.82 2.29 1.27 0.93 0.25 1.10
-mfpmath=sse -0.28 2.48 -0.52 0.66 1.95 0.78 9.05 0.72 0.14 -2.80 -0.14 1.42 1.10
-frename-registers -0.42 1.22 -1.13 -0.45 4.24 0.46 -1.90 -0.72 -0.97 1.91 1.47 4.81 0.98
-funit-at-a-time -0.56 3.50 -1.23 0.22 1.12 0.93 0.16 -1.42 2.73 3.43 -0.27 2.64 0.98
prologue using move -0.43 0.54 1.06 0.43 1.06 0.79 -2.75 1.89 3.63 6.29 -0.14 -0.26 0.86
partial SSE moves -0.29 0.81 0.10 -0.44 0.00 0.63 0.00 0.62 0.00 0.26 -0.40 4.78 0.73
-foptimize 0.00 -0.14 0.61 0.22 0.96 0.78 1.96 0.00 -1.93 -1.86 -0.27 3.15 0.60
sibling-calls                          
-maccumulate- -0.28 0.94 -0.11 -0.23 2.53 0.46 1.18 -0.72 2.43 -0.81 0.13 0.63 0.48
outgoing-args                          
-fstrength-reduce -0.42 0.26 -1.22 0.00 0.64 0.00 -0.59 -1.81 0.42 4.30 -0.14 -0.13 0.00
all prologue using move -1.13 -0.27 -0.32 -0.22 1.28 0.94 6.46 -0.11 1.54 -1.33 0.39 0.50 0.61
-ffast-math -0.28 0.40 -1.24 -0.23 -1.92 0.00 0.08 0.10 0.56 1.34 -0.27 -3.56 -0.73
-fpeel-loops 0.00 0.13 -1.13 0.22 -1.20 -0.62 0.08 -1.34 -1.69 -3.86 -0.40 -0.26 -0.73
-funroll-all-loops 0.00 0.13 0.10 0.00 -0.48 -0.16 -0.84 2.04 -2.12 -5.58 0.26 -7.90 -1.70
-cmodel=medium -5.12 -1.21 -2.97 0.44 -10.61 -0.78 -1.09 0.00 0.28 -4.85 -0.67 -7.74 -3.28
-fpic -12.73 -1.89 -2.36 -0.89 -13.88 -6.96 -4.36 -12.79 -2.11 -18.23 -10.03 -8.87 -8.12
File size (relative increase of the size of stripped binaries in percents):
options gzip vpr gcc mcf crafty parser eon perl gap vortex bzip2 twolf total
aggressive optimization -24.01 -19.87 -6.95 -16.43 -11.81 24.89 -14.11 -12.48 -31.74 -8.77 17.87 -36.91 -13.57
-fbranch-probabilities -12.51 -8.07 -5.50 -0.95 -2.64 -2.55 -5.80 -7.77 -14.58 -5.56 -12.11 -10.22 -7.10
-maccumulate- -1.79 -1.55 -2.33 -1.44 -0.87 -2.85 -3.31 -1.77 -4.10 -3.78 3.05 -1.85 -2.58
outgoing-args                          
-fgcse 0.73 -1.16 -1.95 -0.37 -1.92 -1.27 -0.32 -0.59 -0.38 -0.68 -0.06 -3.33 -1.23
-fomit-frame-pointer -1.38 1.02 -0.81 -0.91 -0.27 -1.20 -1.94 -1.43 -1.10 1.41 -0.06 -1.20 -0.72
-fstrict-aliasing 0.12 -1.14 -0.11 -0.73 0.00 0.36 0.36 -0.58 -0.56 -0.66 0.00 -5.14 -0.46
-mred-zone 0.00 -0.06 -0.06 0.00 0.00 0.00 -0.34 -0.04 -0.02 0.12 0.00 -0.05 -0.05
-fschedule-insns2 -0.07 -0.06 -0.07 -0.19 0.01 0.07 0.00 -0.01 -0.02 0.00 0.00 -0.04 -0.03
-foptimize 0.06 -0.04 0.10 0.00 0.00 -0.04 -0.45 -0.20 0.13 -0.01 -0.06 -0.05 -0.03
sibling-calls                          
-fstrength-reduce 0.24 0.11 -0.01 0.18 0.01 0.03 -0.02 0.00 0.10 0.00 0.00 0.12 0.02
partial SSE moves 0.00 0.27 0.00 0.00 0.00 0.01 0.24 0.00 0.00 0.00 0.00 0.01 0.03
full sized loads and moves 0.18 0.09 0.17 0.00 0.00 0.40 0.01 0.00 0.13 0.00 0.00 0.07 0.10
-mfpmath=sse 0.00 -1.35 -0.05 -0.55 -0.14 -0.08 3.34 -0.58 0.00 0.00 0.00 -1.39 0.15
prologue using move 0.00 0.07 0.14 0.00 -0.05 0.40 -0.02 0.45 0.28 0.37 -0.06 0.06 0.20
-funroll-loops 1.73 0.98 0.34 3.97 1.51 3.22 0.28 0.04 1.00 0.00 0.00 0.77 0.52
-freorder-blocks 0.24 0.11 1.05 -0.55 0.00 -0.04 0.20 0.63 0.36 0.00 0.00 0.21 0.53
-frename-registers 1.35 1.18 1.26 0.00 1.47 0.71 2.27 0.67 0.62 0.66 0.00 2.19 1.16
-ftracer 0.67 1.36 1.57 2.61 2.02 2.29 0.44 1.30 1.61 2.01 0.00 0.58 1.43
-fbranch-probabilities 6.09 4.09 5.60 5.44 6.03 9.87 -0.21 3.90 3.58 4.49 7.78 3.27 4.40
-fguess-branch...                          
-funit-at-a-time -14.10 2.25 12.02 0.00 2.04 5.62 0.00 4.14 6.08 2.66 7.60 1.92 5.94
-m64 16.48 -2.64 8.02 18.47 -19.00 15.52 0.25 11.38 9.65 -5.69 8.64 -3.44 3.90
-finline-functions 8.71 7.94 23.54 2.80 3.51 39.11 -0.09 11.96 9.86 4.17 39.65 2.71 12.98
-ffast-math 0.00 -0.02 0.03 0.00 0.00 0.00 0.00 -0.05 0.00 -0.02 0.00 0.01 0.00
-funroll-all-loops 0.00 0.23 0.04 2.18 0.00 1.26 0.00 0.57 0.09 0.00 0.00 -2.94 0.03
-fpic 16.27 4.69 -6.01 0.18 17.87 -21.91 0.96 1.39 6.50 7.12 -21.77 14.97 0.38
-fpeel-loops 1.57 0.39 0.35 1.63 1.98 5.80 0.00 0.57 0.96 0.00 0.00 1.25 0.66
all prologue using move 2.18 2.85 1.30 1.45 0.26 2.63 2.31 1.71 2.95 2.77 -0.72 2.62 1.91
-mcmodel=medium 14.15 9.85 7.56 19.12 18.58 7.95 5.97 9.93 9.90 7.91 21.15 12.94 9.01


Performance (relative speedups in percents):
Table: 64-bit SPECfp 2000 with Aggressive Optimization (AMD Opteron)
options wupwise swim mgrid applu mesa art equake ammp sixtrack apsi avg
  1.30 0.00 0.89 0.56 -5.34 -0.28 0.00 -0.13 -1.29 1.21 -0.16
aggressive optimization 101.11 53.87 686.79 225.30 101.38 26.80 100.81 123.51 225.00 180.97 149.23
-m64 5.00 -0.27 16.25 9.79 28.55 83.54 -1.31 19.17 28.33 20.86 19.34
-mfpmath=sse 13.97 0.12 2.40 2.33 7.04 100.28 1.79 16.64 22.22 5.67 13.80
-fbranch-probabilities -0.83 0.39 10.83 3.96 19.62 2.23 -0.28 6.85 2.24 0.70 3.98
-fguess-branch...                      
partial SSE moves 1.58 0.13 2.18 1.76 0.70 1.27 -2.51 3.17 6.14 2.54 1.74
-fstrict-aliasing 0.13 0.00 0.00 0.00 -0.90 4.49 1.37 5.49 0.00 4.71 1.73
full sized loads and moves -2.25 0.26 3.31 1.16 4.29 2.40 2.92 0.86 2.25 0.89 1.57
-fschedule-insns2 0.13 0.12 13.06 0.57 -9.93 1.53 -0.68 5.49 3.71 1.58 1.41
-ftracer 0.27 0.00 -0.19 -0.19 -2.85 0.97 1.79 1.10 0.00 0.34 0.15
-mred-zone -0.95 0.00 -0.19 1.15 -2.32 0.13 1.09 0.00 0.00 0.00 -0.16
prologue using move -1.53 0.13 -0.18 -0.20 0.91 -0.84 -0.14 0.00 0.00 -0.18 -0.16
-frename-registers 0.00 0.00 4.52 -0.76 -12.07 1.83 3.21 1.84 1.39 -1.03 -0.31
-fbranch-probabilities -1.61 0.00 -0.37 -0.57 7.36 -0.83 -0.14 -0.49 0.83 -4.16 -0.32
-fomit-frame-pointer -1.08 0.00 0.54 0.95 -11.17 -0.69 0.68 0.85 0.00 1.94 -0.62
-finline-functions 0.00 0.12 -0.19 0.00 -12.12 2.97 1.23 0.36 -0.28 0.00 -0.77
-maccumulate- 3.20 -0.13 0.00 -0.19 -9.94 -0.70 0.40 -0.13 -0.28 0.00 -0.78
outgoing-args                      
-freorder-blocks 1.08 0.00 -0.19 -0.19 -11.27 1.11 0.13 1.72 0.00 0.00 -0.78
-funroll-loops -2.43 -0.13 0.00 1.34 -11.02 0.83 0.54 3.25 0.00 0.34 -0.78
-foptimize -1.20 0.00 -0.37 0.00 -13.20 0.97 -0.28 -0.49 0.00 0.34 -1.23
sibling-calls                      
-fstrength-reduce -1.85 0.00 -0.37 5.20 -13.15 -0.14 0.95 -0.85 1.39 -2.04 -1.23
-funit-at-a-time -0.96 0.12 -0.19 -0.19 -11.26 0.00 1.09 0.00 0.00 0.00 -1.24
-fgcse -1.46 -0.39 -7.52 -4.36 -12.53 1.26 0.40 -0.13 -1.63 -3.19 -3.02
-ffast-math -2.01 0.00 -0.19 1.13 14.99 -0.70 2.16 1.45 -0.83 2.94 1.86
-fpeel-loops 9.94 0.00 -0.19 0.18 0.00 -0.83 -1.22 0.00 0.00 -0.18 0.62
-funroll-all-loops -0.41 0.12 0.00 -0.19 0.00 0.98 -1.49 -0.13 0.00 0.17 -0.16
-fpic 5.42 -0.13 0.00 -0.95 14.84 0.55 -1.76 0.00 -20.67 -0.18 -0.63
all prologue using move -5.90 0.00 -0.89 -0.39 0.20 -0.28 0.54 -0.62 0.00 0.17 -0.78
-cmodel=medium -0.54 -0.13 -0.55 -1.71 9.68 -3.19 -1.76 -3.88 -16.53 -1.22 -2.01
File size (relative increase of the size of stripped binaries in percents):
options wupwise swim mgrid applu mesa art equake ammp sixtrack apsi total
aggressive optimization -16.48 -15.91 -34.31 -57.92 -33.11 8.36 -29.40 -26.61 -36.44 -25.42 -34.22
-fbranch-probabilities 0.55 -8.26 -2.73 -3.79 -12.90 -10.98 -9.59 -7.97 -4.00 -7.95 -7.22
-maccumulate- -1.93 -0.62 -1.78 -0.78 -3.49 -0.97 -0.99 -1.92 -0.80 -1.19 -1.67
outgoing-args                      
-mred-zone 0.00 -0.21 -0.37 -2.03 -0.77 -0.13 -0.13 -0.03 -0.01 -0.30 -0.30
-fstrict-aliasing 0.00 0.00 0.00 0.00 -0.75 6.80 -10.04 0.00 0.00 -0.18 -0.27
-fgcse 0.00 -8.64 -4.00 -10.19 -0.74 1.91 -0.38 0.00 1.70 -3.61 -0.07
-fschedule-insns2 0.00 0.00 0.00 0.00 -0.10 0.00 0.00 0.00 0.00 0.00 -0.03
prologue using move -0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.00
-foptimize 0.00 0.00 -0.37 0.00 -0.18 0.00 0.00 0.00 0.36 0.10 0.13
sibling-calls                      
full sized loads and moves 0.00 0.00 0.00 0.00 0.00 0.00 0.12 0.00 0.34 0.08 0.17
-freorder-blocks 0.00 0.00 0.00 0.00 0.03 0.24 0.49 0.00 0.42 -0.09 0.21
-funit-at-a-time 0.00 0.00 0.00 0.00 0.11 0.12 4.67 1.85 0.00 0.00 0.23
-fomit-frame-pointer 8.70 0.82 0.91 -1.92 -0.51 -0.73 -0.38 0.51 0.40 5.13 0.57
-fstrength-reduce 0.00 0.00 0.18 -0.51 0.03 0.00 0.12 0.00 1.20 0.12 0.59
partial SSE moves 0.00 0.20 0.18 0.39 0.77 0.60 0.00 0.00 0.82 0.23 0.65
-ftracer 11.68 0.41 -1.26 0.00 0.03 0.36 0.87 5.54 0.00 0.92 0.70
-funroll-loops 10.37 14.33 2.03 2.81 0.03 6.59 3.06 2.39 0.35 2.96 1.09
-fbranch-probabilities 12.12 15.26 2.69 3.25 0.02 19.33 5.59 8.65 0.43 4.67 1.92
-fguess-branch...                      
-frename-registers 8.99 0.82 0.54 2.99 2.38 1.85 1.76 2.69 2.57 1.58 2.53
-finline-functions 0.00 0.00 0.00 0.00 5.92 18.22 4.94 2.41 1.27 1.84 2.75
-mfpmath=sse 8.70 2.96 2.03 8.08 -0.75 6.59 3.99 5.54 5.28 5.13 3.72
-m64 45.40 201.01 156.05 26.51 17.41 39.81 27.06 23.41 28.79 38.22 28.68
-ffast-math 0.00 -0.83 0.00 0.94 -0.85 -6.44 -4.84 -8.23 0.40 -0.18 -0.81
-funroll-all-loops 0.00 0.00 0.00 0.00 0.00 0.24 0.61 0.00 0.00 0.00 0.01
-fpeel-loops 0.00 0.00 0.00 1.39 0.00 0.36 1.36 0.00 0.00 0.12 0.07
all prologue using move -0.49 8.82 1.79 1.28 2.22 8.41 0.99 2.15 0.36 4.23 1.49
-fpic 0.65 -6.38 2.35 1.11 5.32 -3.71 13.13 2.23 6.58 3.47 5.21
-mcmodel=medium 0.00 9.45 2.17 7.98 5.43 10.44 11.27 5.24 23.48 6.72 14.49


Performance (relative speedups in percents):
Table: 32-bit SPECint 2000 with Aggressive Optimization (AMD Opteron)
options gzip vpr gcc mcf crafty parser eon perl gap vortex bzip2 twolf avg
  1.06 -0.14 0.42 0.69 0.11 0.00 -0.13 0.20 -0.28 0.85 0.71 3.52 0.75
aggressive optimization 96.74 76.81 73.11 14.74 56.38 83.61 349.45 111.06 98.34 71.82 122.25 67.09 89.12
-march=i386 to k8 5.23 8.41 3.45 0.17 9.02 6.80 82.00 -0.52 0.41 14.78 2.45 8.52 10.08
-fbranch-probabilities 8.34 2.37 12.33 1.40 4.25 7.49 17.57 14.35 8.99 12.75 6.47 0.87 7.37
-fguess-branch...                          
-fbranch- 2.94 0.41 10.33 0.17 2.91 5.43 0.61 8.82 2.41 8.26 6.45 0.77 3.89
probabilities                          
-fomit-frame-pointer 8.64 1.36 0.84 0.17 2.26 6.51 0.73 0.41 4.58 2.66 6.25 3.78 3.26
-fgcse 1.99 1.52 -2.27 -0.69 0.57 -4.36 5.14 8.00 2.67 2.93 1.86 2.98 1.77
-finline-functions 0.90 1.96 0.00 2.84 2.91 6.62 1.86 0.82 1.41 3.34 1.87 1.78 2.17
-ftracer 0.15 1.94 4.58 -0.52 -0.34 -2.23 3.94 9.70 0.13 1.74 3.05 0.77 1.78
-fschedule-insns2 2.30 2.22 2.47 -0.35 2.32 0.15 0.12 1.87 -0.69 2.04 1.73 2.70 1.52
-funit-at-a-time -0.60 8.91 3.47 -0.18 2.55 -1.50 0.12 7.50 -1.10 1.83 0.28 -0.67 1.39
-freorder-blocks 1.99 0.68 7.88 -0.87 3.52 0.76 -0.37 1.24 -0.83 2.23 2.01 -1.00 1.26
-funroll-loops -0.31 -0.55 0.00 0.34 0.22 -1.79 6.77 0.72 0.69 2.71 1.14 3.53 1.25
-march=ppro to k8 5.91 -1.89 2.37 0.34 0.45 -4.22 2.63 0.30 1.11 0.38 2.75 2.60 1.13
-maccumulate- 0.60 -0.28 0.53 0.00 2.67 -2.08 5.95 2.62 0.27 4.06 1.00 -2.15 0.88
outgoing-args                          
-frename-registers -0.30 1.65 -0.94 -1.04 0.68 -2.67 -1.57 0.00 -0.14 2.74 0.85 5.49 0.75
-foptimize -0.16 0.27 2.24 0.34 -0.34 -1.93 -1.21 -0.11 0.69 1.93 0.56 0.11 0.25
sibling-calls                          
-fstrict-aliasing 1.07 -1.37 0.21 1.39 -0.12 0.00 0.12 0.10 0.55 0.09 0.71 0.55 0.25
-fstrength-reduce -0.16 0.54 -0.53 -1.04 0.57 -2.51 0.12 0.00 -1.10 -1.14 0.28 1.10 -0.25
-funroll-all-loops 3.10 -0.28 0.31 -0.87 0.11 2.73 0.49 2.98 0.68 1.14 -0.15 1.98 1.00
-mfpmath=sse 1.83 2.32 1.28 -1.38 0.11 0.45 0.36 0.51 1.39 0.94 0.85 0.32 0.75
-ffast-math -0.31 1.09 0.63 0.34 -0.46 0.15 0.12 0.72 0.55 0.86 0.42 0.44 0.50
-fpeel-loops 2.29 0.00 -0.32 -0.52 0.90 3.17 0.00 0.10 -3.43 -0.29 0.70 -1.20 0.00
-fpic -20.49 -5.64 -17.55 -3.28 -29.60 -28.19 -10.27 -29.75 -23.00 -35.03 -25.65 -17.66 -20.81
File size (relative increase of the size of stripped binaries in percents):
options gzip vpr gcc mcf crafty parser eon perl gap vortex bzip2 twolf total
aggressive optimization -18.85 -6.25 3.51 -21.10 2.34 33.46 -4.21 -6.83 -22.83 -2.91 33.80 -22.33 -4.05
-fbranch-probabilities -14.82 -8.93 -5.82 0.67 -1.96 -3.46 -5.89 -7.95 -14.56 -3.10 -11.81 -10.11 -6.87
-fgcse 1.21 -1.15 -1.23 0.00 2.31 -0.93 0.20 0.52 0.21 0.51 -1.59 -1.60 -0.28
-foptimize 0.07 0.11 0.09 0.00 0.07 0.00 -1.44 0.05 0.01 -0.03 -1.18 -0.02 -0.14
sibling-calls                          
-fstrict-aliasing 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-fstrength-reduce 0.21 0.09 0.02 0.00 0.05 -0.29 0.03 0.09 0.12 0.00 0.00 -0.19 0.02
-fschedule-insns2 -0.15 0.21 -0.07 0.00 -0.07 0.00 1.63 -0.02 -0.04 -0.01 0.00 0.03 0.15
-march=ppro to k8 -2.15 1.33 -0.40 0.00 -0.36 0.00 5.56 -0.29 -0.49 0.10 -1.18 0.31 0.40
-funroll-loops 3.06 0.81 0.32 0.00 1.16 2.91 0.08 0.21 0.88 0.08 2.31 0.31 0.48
-frename-registers 0.49 0.48 0.52 0.00 0.51 0.00 1.42 0.81 0.22 0.10 1.02 0.31 0.55
-freorder-blocks -0.08 -0.06 1.22 0.00 0.50 -0.03 0.17 0.82 0.29 0.10 0.53 0.22 0.62
-fomit-frame-pointer -1.77 2.89 0.39 0.00 -0.14 0.77 4.52 -0.79 0.17 2.38 -2.80 -0.11 0.95
-ftracer 0.00 1.33 1.78 0.00 4.56 2.91 0.31 2.07 1.71 2.56 0.29 0.31 1.80
-fbranch-probabilities 6.98 3.72 6.73 0.67 9.29 9.37 -0.26 4.48 3.81 4.67 6.41 2.35 4.93
-fguess-branch...                          
-maccumulate- 1.29 6.40 6.00 0.00 1.95 2.47 0.38 2.07 4.64 19.88 3.13 4.36 5.87
outgoing-args                          
-funit-at-a-time -11.69 6.01 13.64 0.00 2.27 6.00 0.00 4.45 7.07 2.65 6.53 1.86 6.58
-march=i386 to k8 1.43 9.46 9.78 0.00 3.65 6.00 8.00 4.13 6.70 21.21 4.02 8.63 9.24
-finline-functions 10.90 8.91 28.84 0.00 3.79 39.55 0.16 13.26 10.95 4.65 50.44 2.30 14.46
-ffast-math 0.00 -0.79 0.01 0.00 -0.02 0.00 0.00 -0.13 0.00 -1.23 0.00 -0.06 -0.21
-funroll-all-loops 0.00 0.25 0.05 0.00 0.07 2.83 0.00 0.03 0.07 0.03 1.19 0.21 0.15
-fpeel-loops 2.19 1.15 0.39 0.00 2.81 6.13 0.00 0.21 0.88 0.02 1.25 1.61 0.72
-fpic 12.59 6.19 -4.89 0.00 14.80 -27.60 10.58 4.43 1.15 1.35 -21.21 9.83 0.84
-mfpmath=sse -0.08 1.15 -0.03 0.00 -0.06 0.00 10.10 0.17 0.00 0.00 1.19 -1.80 1.13


Performance (relative speedups in percents):
Table: 32-bit SPECfp 2000 with Aggressive Optimization (AMD Opteron)
options wupwise swim mgrid applu mesa art equake ammp sixtrack apsi avg
  0.13 0.00 0.00 -0.21 0.28 2.57 -0.14 0.00 6.00 0.00 0.72
aggressive optimization 77.83 27.22 445.45 148.97 56.22 -30.46 92.25 101.18 122.37 156.08 98.56
-march=i386 to k8 6.02 0.00 2.53 3.17 13.31 1.54 -0.65 1.49 -3.05 2.11 2.41
-fbranch-probabilities 3.49 0.39 4.74 4.28 0.72 1.81 -1.42 7.93 -2.16 0.20 1.66
-fguess-branch...                      
-fomit-frame-pointer -0.14 0.12 3.49 2.25 9.32 1.02 0.38 0.29 0.00 1.03 1.63
-march=ppro to k8 8.34 0.00 0.00 -0.82 10.41 -1.50 0.26 -0.59 -0.94 -0.62 1.10
-fstrength-reduce 10.13 -0.26 1.46 1.03 -8.02 -1.54 0.13 0.89 -0.32 3.64 0.91
-funroll-loops 3.93 0.00 0.00 0.61 -7.65 1.81 0.52 4.62 0.95 -0.21 0.36
-fstrict-aliasing 0.00 0.00 0.00 0.00 0.00 -1.27 -0.13 0.14 0.00 0.00 0.00
-frename-registers 0.81 0.12 -0.62 0.00 -5.69 -0.52 1.98 -0.15 0.63 0.62 -0.19
-funit-at-a-time 0.13 0.00 0.00 0.00 -5.75 0.25 2.25 0.29 0.00 0.00 -0.19
-ftracer 1.65 0.00 0.00 0.00 -6.54 0.51 0.39 2.26 -0.32 -0.82 -0.37
-finline-functions 0.00 0.00 0.00 0.00 -7.14 3.70 1.85 -0.15 0.00 -0.21 -0.37
-maccumulate- 2.20 0.00 0.20 0.20 -6.37 -0.76 -0.40 0.00 0.00 0.41 -0.37
outgoing-args                      
-foptimize -0.27 0.00 0.00 0.00 -6.44 2.84 0.00 0.14 -0.32 0.00 -0.37
sibling-calls                      
-fschedule-insns2 -0.54 0.13 1.04 2.72 -6.49 -0.26 -1.67 1.34 -6.48 1.04 -0.72
-freorder-blocks 0.68 -0.13 0.20 0.00 -4.78 -1.52 -0.13 1.04 -1.55 -0.62 -0.73
-fbranch-probabilities 1.78 0.00 -0.21 -2.80 0.00 -2.53 0.26 -1.17 -0.63 -2.23 -0.91
-fgcse 2.21 -0.39 0.20 -2.40 -3.99 2.02 -0.13 -0.59 -10.68 0.20 -1.43
-mfpmath=sse 2.43 0.25 3.29 -0.21 12.53 97.20 -0.14 1.47 13.20 3.30 10.14
-ffast-math 1.21 0.25 0.00 2.04 3.13 -0.26 3.89 0.58 -0.95 3.09 1.44
-fpeel-loops 3.78 0.00 0.00 2.25 0.00 0.51 -0.26 0.00 0.00 0.00 0.54
-funroll-all-loops 0.00 0.12 0.00 0.00 0.00 -2.54 -0.26 0.14 0.00 0.00 -0.19
-fpic -5.15 0.25 -3.72 3.46 -0.43 -1.31 -10.15 -2.36 -11.64 -1.45 -3.10
File size (relative increase of the size of stripped binaries in percents):
options wupwise swim mgrid applu mesa art equake ammp sixtrack apsi total
aggressive optimization -3.88 -1.94 -20.88 -25.85 -23.54 14.89 -16.01 -17.99 -17.79 -11.69 -18.60
-fbranch-probabilities 0.24 -2.71 0.69 -4.31 -14.27 -7.93 -4.87 -11.72 -4.35 -7.09 -7.78
-fstrict-aliasing 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-march=ppro to -march=k8 0.00 0.53 0.00 -4.45 1.81 0.77 0.10 0.41 -0.60 0.00 0.04
-funit-at-a-time 0.00 0.00 0.00 0.00 0.24 0.00 3.26 0.43 0.00 0.00 0.14
-freorder-blocks 0.00 0.00 0.00 0.04 0.15 -0.12 0.43 0.31 0.28 0.00 0.20
-foptimize 0.00 0.00 0.00 0.00 0.26 0.00 0.00 0.02 0.23 1.56 0.30
sibling-calls                      
-frename-registers 0.00 0.26 0.00 0.04 0.25 0.33 0.65 0.02 0.61 0.00 0.38
-ftracer 7.98 0.00 0.00 0.00 0.07 0.44 0.76 5.35 0.00 1.44 0.66
-funroll-loops 5.15 6.26 0.00 1.15 0.06 7.76 1.21 0.43 0.07 2.48 0.57
-fgcse -1.84 3.89 0.00 -4.45 -0.79 0.11 0.54 0.09 2.85 -3.17 0.76
-fbranch-probabilities 10.49 6.75 0.69 1.60 -0.55 11.19 2.58 7.22 0.63 3.16 1.38
-fguess-branch...                      
-fschedule-insns2 0.00 0.81 0.00 1.44 0.63 0.66 1.32 3.68 2.53 2.07 1.90
-fomit-frame-pointer 2.10 1.60 0.00 4.64 2.24 0.00 0.54 4.52 1.22 9.28 2.41
-fstrength-reduce 0.00 -1.33 0.00 31.50 -0.04 -2.69 -0.22 -0.54 4.37 3.07 3.14
-finline-functions 0.00 0.00 0.00 0.00 6.28 13.54 6.74 1.95 1.85 3.97 3.23
-march=i386 to -march=k8 7.17 -4.61 0.00 1.44 6.05 0.55 0.87 -0.68 4.19 8.23 4.35
-maccumulate- 7.52 1.91 0.00 0.71 3.53 1.23 1.77 0.43 6.49 9.36 5.03
outgoing-args                      
-ffast-math 0.00 -0.81 0.00 0.23 -1.37 -31.41 -31.50 -6.71 -0.07 -0.78 -1.89
-funroll-all-loops 0.00 0.00 0.00 0.00 0.00 0.11 0.65 0.00 0.00 0.00 0.01
-fpeel-loops 0.77 0.00 0.00 0.42 0.00 0.22 1.19 0.00 0.06 0.00 0.08
-fpic 4.90 -6.17 0.00 -25.58 9.63 -3.10 2.72 5.98 7.44 -0.10 5.74
-mfpmath=sse 4.04 7.23 0.00 10.72 2.53 7.29 8.28 15.12 8.83 6.81 7.33


Performance (relative speedups in percents):
Table: 64-bit SPECint 2000 with Aggressive Optimization (DEC Alpha EV56/600Mhz)
options gzip vpr gcc mcf crafty parser eon perl gap vortex bzip2 twolf avg
  0.00 -0.66 0.71 0.00 1.63 0.00 0.60 0.00 8.02 5.84 -0.55 4.72 1.96
aggressive optimization 143.98 77.03 73.26 16.94 105.84 141.75 505.83 119.81 128.84 94.27 180.89 71.33 115.27
-fschedule-insns2 16.23 10.00 1.51 2.20 11.18 2.75 20.56 8.84 2.87 3.78 15.38 5.63 8.08
-fschedule-insns                          
-funit-at-a-time 1.42 2.63 3.73 3.67 -2.83 28.33 18.57 16.66 0.00 16.42 3.52 5.51 7.63
-finline-functions 5.18 2.63 2.18 1.47 14.63 31.62 1.19 8.33 0.00 22.13 4.73 2.70 6.84
-fbranch-probabilities 1.45 7.58 6.06 2.22 15.47 27.50 5.03 14.00 2.08 -3.48 0.00 0.65 6.16
-fguess-branch...                          
-fbranch-probabilities 9.30 2.66 6.81 5.97 -4.33 29.66 -1.18 9.93 4.22 17.51 2.31 -0.66 5.44
-fschedule-insns2 7.87 6.16 3.75 0.72 7.69 -0.89 7.84 7.38 1.42 3.14 7.89 5.63 5.03
-fomit-frame-pointer 0.00 0.00 2.94 0.00 5.34 2.01 5.76 7.69 3.52 3.18 5.26 1.33 2.63
-freorder-blocks 0.71 0.00 2.15 0.00 14.10 1.31 -6.94 5.62 3.52 4.48 2.22 2.64 2.63
-fgcse 5.42 0.00 1.41 0.72 -1.02 -0.65 14.86 1.19 2.09 2.54 2.82 -0.66 1.94
-fif-conversion 2.96 5.47 0.00 2.20 4.97 0.65 13.15 0.00 2.08 3.20 -0.56 1.31 2.61
-fstrength-reduce -3.53 -1.28 1.44 2.18 -3.30 -0.65 22.30 -2.96 2.08 -1.87 2.27 4.08 1.97
-funroll-loops -1.42 0.00 2.18 0.00 22.29 0.00 3.65 -0.60 -1.37 1.87 0.00 -3.88 1.30
-fstrict-aliasing -2.88 4.08 -0.71 0.73 2.13 8.45 -16.97 4.34 4.25 3.16 4.59 0.65 0.65
-frename-registers 0.71 0.64 0.71 -0.72 5.40 0.66 5.73 2.40 0.68 -12.50 -1.11 3.44 0.65
-foptimize -2.12 0.00 0.71 -1.42 -14.80 -0.65 2.42 -2.49 0.68 1.86 1.11 -0.65 -1.28
sibling-calls                          
-ftracer 0.00 -4.55 0.00 -2.16 -12.07 -0.65 3.06 -2.95 0.00 1.25 1.11 -7.10 -2.59
-ffast-math -1.44 -3.73 -2.12 2.18 7.65 0.00 -1.78 -0.59 1.37 8.60 -0.55 1.32 1.29
-funroll-all-loops 0.70 -0.65 -2.78 0.72 2.59 1.30 5.16 4.40 0.00 -3.04 0.55 -3.25 0.00
-fpeel-loops 0.00 3.28 -0.71 -1.43 4.44 1.30 -3.51 -2.36 0.00 0.61 0.00 -1.30 0.00
-fold-unroll-loops 0.00 0.64 0.00 0.72 -4.62 1.31 10.71 3.03 -1.37 -2.54 -6.63 0.00 0.00
-fpic 0.00 -2.64 0.00 0.73 -13.23 3.63 -4.10 -0.65 -1.40 -3.71 5.48 -2.65 -2.05
File size (relative increase of the size of stripped binaries in percents):
options gzip vpr gcc mcf crafty parser eon perl gap vortex bzip2 twolf total
aggressive optimization -38.22 -29.20 -9.28 -42.75 -28.90 5.66 -49.91 -12.38 -36.23 -17.64 -3.00 -39.40 -22.85
-fbranch-probabilities -10.66 -1.50 -2.43 0.79 -0.71 2.11 -4.12 -6.17 0.00 -3.29 -9.80 -5.73 -3.09
-fomit-frame-pointer -10.98 -3.61 -1.53 0.00 -1.19 -3.23 -7.01 -2.35 -2.88 -2.10 -1.09 -3.01 -2.64
-fgcse -0.25 -1.53 -1.07 0.00 -0.87 -1.56 -1.29 -0.48 0.08 0.01 -10.13 0.00 -0.84
-fstrict-aliasing 0.03 -1.22 0.00 0.00 -0.07 -0.28 0.26 -0.20 -0.53 -0.26 0.00 -3.01 -0.28
-freorder-blocks -0.04 0.01 0.31 0.00 -0.43 0.01 -1.35 -0.23 -0.27 0.00 0.00 0.00 -0.09
-foptimize 0.06 -0.01 0.23 0.00 0.00 0.01 -1.26 -0.04 0.10 0.00 0.00 0.00 -0.02
sibling-calls                          
-frename-registers 0.06 -0.09 0.00 0.00 -0.10 0.02 0.08 -0.09 0.01 -0.03 0.00 0.00 -0.02
-fif-conversion -0.10 -0.19 0.28 0.00 0.15 -0.21 -1.31 0.05 0.04 0.00 0.00 0.00 -0.01
-fstrength-reduce 0.06 0.33 0.00 0.00 0.23 -0.48 0.05 0.06 0.20 0.01 0.01 0.00 0.04
-funroll-loops 0.06 0.00 0.27 0.00 0.12 0.34 0.00 0.05 0.00 0.00 0.00 0.00 0.12
-ftracer 0.04 0.63 2.22 0.00 3.66 5.31 0.11 2.09 -0.11 3.25 0.00 3.19 1.99
-funit-at-a-time -20.22 0.29 9.22 0.83 1.09 6.54 -4.12 4.22 0.00 -1.08 0.30 -2.99 3.12
-fbranch-probabilities 0.46 4.61 5.48 0.79 5.40 6.52 0.20 4.37 0.06 4.34 0.42 3.34 3.90
-fguess-branch...                          
-fschedule-insns2 0.00 4.24 4.73 0.00 3.87 0.00 5.63 3.53 4.29 3.47 0.00 3.41 4.06
-fschedule-insns2 0.00 4.42 5.01 0.00 3.87 0.00 7.14 4.76 5.25 4.69 0.00 3.41 4.76
-fschedule-insns                          
-finline-functions 0.47 8.20 23.93 0.79 3.89 43.62 -4.17 14.35 0.00 2.22 52.11 -2.89 11.68
-ffast-math -0.31 -0.09 -0.01 -0.40 -0.06 -0.12 -0.04 -0.01 -0.07 -0.04 -0.12 -0.03 -0.04
-funroll-all-loops 0.99 0.31 0.00 0.00 0.43 2.29 0.00 0.11 0.00 0.02 0.00 0.00 0.13
-fpeel-loops 12.32 0.57 0.03 0.00 2.11 6.22 0.00 0.18 0.00 0.04 0.22 0.00 0.49
-fpic -1.53 1.09 0.12 0.39 1.78 5.18 2.52 1.25 1.35 0.21 1.28 0.80 0.92
-fold-unroll-loops 12.39 8.85 -1.48 0.00 5.54 5.61 2.90 2.75 13.59 0.00 11.26 9.30 2.83


Performance (relative speedups in percents):
Table: 64-bit SPECfp 2000 with Aggressive Optimization (DEC Alpha EV56/600Mhz)
options wupwise swim mgrid applu mesa art equake ammp apsi avg
  0.00 -0.75 -0.21 0.00 0.93 0.00 0.83 -0.84 -1.76 0.00
-fschedule-insns2 14.49 10.74 50.22 17.06 28.57 7.60 17.08 24.61 25.41 21.69
-fschedule-insns                    
-fschedule-insns2 1.93 0.00 0.92 3.25 34.50 7.73 4.67 5.26 0.00 5.78
-fstrength-reduce 9.27 0.75 2.71 4.88 2.85 1.19 2.56 0.84 1.81 3.17
-fbranch-probabilities 3.12 0.00 1.44 1.33 14.21 7.10 3.41 -0.83 0.90 3.14
-fguess-branch...                    
-ftracer 1.85 0.00 1.02 0.20 8.54 1.14 0.82 0.84 6.79 2.36
-fbranch-probabilities 1.85 -0.75 -1.83 0.40 5.85 1.65 8.03 1.69 -1.74 1.56
-funit-at-a-time 2.48 0.74 -0.21 0.40 7.25 -1.68 10.00 2.56 -3.45 1.56
-fstrict-aliasing 0.00 0.00 -0.21 0.00 2.35 -6.56 9.00 0.84 0.90 0.77
-fomit-frame-pointer 2.48 -0.75 -0.41 0.00 4.34 -0.58 6.19 0.00 -0.88 0.77
-fgcse 0.60 0.00 0.00 -2.18 3.33 6.50 6.14 0.84 -2.61 0.76
-finline-functions 1.85 -0.75 -0.31 0.20 7.42 -9.40 2.56 0.84 -2.59 0.00
-freorder-blocks 0.00 -0.75 0.00 0.10 4.32 -5.24 6.14 -1.64 0.00 0.00
-frename-registers 0.60 -1.49 0.40 0.40 5.85 -1.66 0.82 0.84 -1.79 0.00
-foptimize -0.61 0.00 -1.42 0.10 2.35 -4.66 5.21 0.00 -1.76 0.00
sibling-calls                    
-fif-conversion 0.00 0.00 0.20 0.20 0.94 1.10 4.31 -0.84 -0.90 0.00
-funroll-loops 0.60 -2.99 -1.01 0.10 1.87 -3.98 0.83 0.00 -0.88 -0.77
-fold-unroll-loops 6.66 -0.75 0.20 2.43 -36.75 3.48 -4.96 1.66 3.63 -3.08
-ffast-math -0.60 0.00 0.10 0.30 -0.47 2.90 -5.47 -0.83 -2.59 -0.76
-fpic 0.63 0.00 -0.21 0.20 -2.04 2.95 0.00 0.00 0.87 0.00
-funroll-all-loops 0.00 -0.75 0.71 0.70 0.00 7.55 -0.82 -4.17 -1.79 0.00
-fpeel-loops 3.63 0.00 0.20 6.06 0.00 4.06 -4.14 0.00 0.00 0.76
File size (relative increase of the size of stripped binaries in percents):
options wupwise swim mgrid applu mesa art equake ammp apsi total
-fbranch-probabilities 0.37 -0.11 0.20 0.15 -7.43 -6.42 -0.06 -0.92 -2.47 -4.77
-funit-at-a-time 0.37 -0.11 0.20 0.15 -7.37 -6.42 0.57 0.03 -2.47 -4.61
-fomit-frame-pointer 0.00 -0.53 -1.53 -0.35 -3.45 -7.19 -2.12 -4.38 -1.30 -2.96
-fgcse 0.00 -26.92 0.57 -8.87 -1.06 -7.19 0.25 -0.02 -0.74 -1.93
-fstrict-aliasing 0.00 0.00 0.00 0.00 -0.31 -7.19 -2.17 -0.10 -0.15 -0.44
-fif-conversion 0.00 -0.21 -0.09 -0.08 -0.22 -0.73 0.05 0.31 -0.03 -0.11
-foptimize 0.00 0.00 0.00 0.00 -0.04 0.00 -0.06 -0.01 0.00 -0.02
sibling-calls                    
-freorder-blocks 0.00 0.10 0.00 0.02 0.28 -0.19 0.11 -0.43 -0.20 0.07
-funroll-loops 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.02 0.00
-frename-registers 0.00 0.10 0.16 -0.08 0.13 0.18 0.22 0.11 -0.32 0.05
-finline-functions 0.37 0.20 0.28 0.13 -0.74 19.29 8.46 1.21 -1.87 0.09
-ftracer 8.22 0.00 0.08 0.02 0.22 0.55 0.22 1.01 1.72 0.79
-fbranch-probabilities 9.36 0.84 1.36 -1.20 0.00 2.79 0.97 4.33 1.80 1.17
-fguess-branch...                    
-fstrength-reduce 7.50 2.68 3.28 7.17 -0.17 0.00 0.22 0.37 7.24 1.70
-fschedule-insns2 3.87 2.47 3.73 6.90 5.47 3.11 4.91 5.97 6.53 5.64
-fschedule-insns2 3.78 2.47 4.26 11.04 5.55 3.11 5.66 5.97 6.97 6.25
-fschedule-insns                    
-fpic -1.98 -0.42 0.24 -2.57 -0.09 2.76 1.10 -1.06 -0.08 -3.83
-ffast-math -0.21 -2.00 -1.05 -0.97 0.27 0.30 -0.60 -0.76 -0.61 -0.17
-fpeel-loops 0.00 0.00 0.00 0.90 0.00 7.73 2.60 0.00 0.00 0.30
-funroll-all-loops 0.00 0.00 0.81 0.29 0.00 7.73 1.13 0.37 0.27 0.34
-fold-unroll-loops 2.71 36.40 15.56 5.15 4.52 23.73 6.75 15.75 8.91 7.79


Real World Performance

One of the main goals has been to develop system ready for both enterprise and desktop (workstation) use. While the need of 64-bit addressing space for the enterprise is well understood, the effect on desktop performance is often discussed. The main drawback of 64-bit system, as discussed in section 2.1 is the increased memory footprint of the programs and subsequent slowdown of program startup times critical for today desktop systems.

In this section we present few simple benchmarks of this phenomenon on SuSE Linux 8.2. Both the 32-bit and 64-bit version of the system were installed on the equally sized ReiserFS partitions in the default configuration. The tests were performed in the same order on both systems with reboots in between. Additional packages were installed as needed. We hope this procedure to minimize amount of the noise in the numbers.


Table: Desktop Performance Relative to 32-bit System
test speedup
bootup time -0.9%
KDE startup from disk 18.1%
KDE startup from cache 14.6%

The Table [*] compares startup times of several programs. As can be seen, the 64-bit system, perhaps surprisingly, is significantly faster in two of them and comparable in bootup times. The Table [*] compares compilation of the package gimp.

Table: Gimp Compilation Times Relative to 32-bit System
    speedup  
test real user system
tar xjf 17.7% 9.8% 4%
./configure -4.3% 0.7% -31%
make 12.9% 19.8% -39%

As can be seen on Table [*] the memory consumption grows up by about 1/4 as expected, but due to relative compactness of CISC AMD64 instruction set, the increase is much smaller than one seen after switching to RISC or VLIW systems.

Table: Memory Resources Consumption
test 32-bit 64-bit increase
konqueror 14 M 18 M 28%
gimp 8.6 M 9.9 M 15%
mozilla 22 M 27 M 22%

In fact Tables [*] and [*] shows decrease in the code section sizes.

Table: Size of Common Binaries in /usr/bin
section 32-bit 64-bit increase
.text 56216 K 53419 K -5%
.bss 18169 K 21098 K 16%
.data 10239 K 14076 K 37%
.rodata 17543 K 19734 K 12%
.eh_frame 546 K 8269 K 1414%
.rela.plt 358 K 1076 K 200%
.rela.dyn 40 K 126 K 215%
total 80435 K 91141 K 13%


Table: Size of Common Shared Libraries
section 32-bit 64-bit increase
.text 71967 K 67526 K -7%
.bss 33463 K 11557 K -72%
.dynstr 13608 K 13587 K -1%
.rodata 12119 K 12217 K 0%
.dynsym 11424 K 7611 K 66%
.eh_frame 6367 K 12730 K 99%
.data 6018 K 9695 K 61%
.rela.dyn 4382 K 12844 K 193%
.plt 3898 K 6499 K 66%
.rela.plt 1293 K 3888 K 200%
.got 823 K 1654 K 100%
total 171812 K 198111 K 15%

The major growths can be seen in the section .eh_frame that is usually not load into the memory and sections related to the dynamic relocations. According to our benchmarks these are not critical, since dynamic loader is still slightly faster in 64-bit version compared to 32-bit.

Overall, we can recommend use of 64-bit system instead of 32-bit on AMD64 machines intended for desktop use as long as memory consumption increased by 25% is not major limitation (that is hardly the case for computers sold today).

Jan Hubicka 2003-05-04