Subsections

Branch Predictors

Methodology

To verify validity of branch prediction heuristics (see chapter [*] for description of individual algorithms) we compile program with profile feedback, dump our predictions and real program behavior to the debug output and use our analyze_branches tool to produce summary.

We tested our results on SPECint2000 suite, a representative collection of programs run on nowadays computers. Since we used the same set of programs for specifying weights of individual heuristics, we decided to test SPECfp2000 suite to cross check our predictor on different code.

Programs contained in SPECfp differ dramatically from the ones in SPECint by both purpose and language. Many SPECfp programs solve numerical problems and are written in Fortran, while SPECint programs are written in C and C++.

We tested only subset of SPECfp tests, because some of them are written in f90 Fortran dialect that is not supported by GCC yet.

Results


Table: Experimental results on subset of SPECint2000 benchmark suite.
Algorithm Branches (rel) Hitrate Max Coverage (rel)
DS theory 53018 60.0% 70.61% 91.03% 11476302572 33.2%
call 26573 30.1% 70.48% 94.17% 5516113528 15.9%
combined 88411 100.0% 77.00% 91.82% 34591922166 100.0%
const return 637 0.7% 95.59% 98.81% 86727364 0.2%
continue 728 0.8% 56.00% 86.38% 1112290143 3.2%
early return 1006 1.1% 66.56% 77.62% 100847138 0.3%
first match 13529 15.3% 90.53% 94.06% 15070692296 43.6%
goto 1826 2.1% 70.57% 90.46% 28111884 0.1%
loop branch 6646 7.5% 89.44% 93.25% 7940058090 22.9%
loop exit 5437 6.1% 91.55% 95.49% 5846544398 16.9%
loop header 7581 8.6% 64.66% 90.14% 1771208607 5.1%
loop iterations 713 0.8% 91.44% 91.44% 1111820785 3.2%
negative return 272 0.3% 96.45% 97.43% 64283908 0.2%
NULL return 356 0.4% 90.52% 92.99% 58741977 0.2%
no prediction 21864 24.7% 60.35% 88.74% 8044927298 23.2%
noreturn call 976 1.1% 99.99% 99.99% 207850824 0.6%
values nonequal 23039 26.1% 69.88% 89.38% 4076339804 11.8%
values positive 2440 2.8% 78.58% 84.57% 1179093128 3.4%
pointer 6037 6.8% 82.83% 94.25% 968014026 2.8%



Table: Experimental results on subset of SPECfp2000 benchmark suite.
Algorithm Branches (rel) Hitrate Max Coverage (rel)
DS theory 10261 51.1% 70.09% 96.69% 3972049376 18.0%
call 4775 23.8% 33.04% 95.45% 787258096 3.6%
combined 20079 100.0% 81.82% 91.76% 22061567792 100.0%
const return 40 0.2% 9.22% 90.77% 177957066 0.8%
continue 10 0.0% 1.58% 98.41% 7958 0.0%
early return 523 2.6% 88.38% 99.81% 7560387 0.0%
first match 4545 22.6% 90.44% 90.53% 14951251972 67.8%
goto 171 0.9% 99.93% 99.99% 5477020 0.0%
loop branch 2804 14.0% 90.60% 90.69% 12182517835 55.2%
loop exit 2151 10.7% 96.07% 96.40% 1945083921 8.8%
loop header 2710 13.5% 99.34% 99.43% 1190273758 5.4%
loop iterations 608 3.0% 74.65% 74.65% 833904864 3.8%
negative return 4 0.0% 100.00% 100.00% 4 0.0%
NULL return 69 0.3% 99.99% 99.99% 12579497 0.1%
no prediction 5273 26.3% 55.63% 91.41% 3138266444 14.2%
noreturn call 134 0.7% 99.99% 100.00% 1646663 0.0%
pointer 481 2.4% 97.89% 99.95% 68871540 0.3%
values nonequal 3298 16.4% 67.14% 94.90% 1813253392 8.2%
values positive 2165 10.8% 97.43% 98.37% 1283740708 5.8%


We present the results in tables [*] and [*]. The ``algorithm'' column specifies name of prediction heuristic as described in chapter [*]. ``first-match'' heuristic refers to result of combining all strong prediction heuristics using first-match principle, while ``DS theory'' refers to result of combining the weak heuristics using Dempster Shaffer theory. Again see chapter [*] for details. ``no prediction'' refers to branches we failed to predict at all and finally ``combined'' refers to combination of all heuristics used by GCC.

The column, called ``branches'' specifies number of conditional jumps in all compiled programs predicted by given heuristic in both absolute and relative form.

One can see that SPECint contains 88411 conditionals and we predicted $ 15\%$ of them using strong heuristics and $ 60\%$ using weak heuristics. We failed to predict $ 24\%$ of branches. SPECfp contains 20079 conditionals and we predicted $ 22\%$ of them using the strong heuristics and $ 51\%$ using weak heuristics. We failed to predict $ 26\%$ of branches.

More important information than static instructions counts are the dynamic instruction counts--counts weighted by actual number of executions of each instruction in the train run. The values are presented in last column again in both absolute and relative form.

In SPECint we predicted $ 43.6\%$ of executed instructions by strong heuristics and $ 33.2\%$ by weak heuristics, that is in sharp contrast to static instruction counts. We failed to predict $ 23.2\%$ of instructions. In SPECfp the numbers are even more biased in same direction--$ 67.8\%$ for strong heuristics, $ 18\%$ for weak heuristics, and $ 14.2\%$ unpredicted.

The hitrate column presents probably the most important value--the probability that conditional jump will go in the direction predicted by heuristics. We present only the dynamic numbers. For SPECint we predicted $ 77\%$ of branches correctly and in SPECfp even almost $ 82\%$10.1. This compares well to $ 72\%$ hitrate we measured by exact re-implementation of Ball and Larus [3] heuristics10.2 that our work is based on and to roughly $ 80\%$ hitrates reported by more expensive branch prediction methods based on intra-procedural weighted value range propagation [5] not yet cheap enough for production compilers.

Strong heuristics were very successful--$ 90\%$ in both cases, while DS theory reached about $ 70\%$ success.

Last column represents values theoretically reachable by perfect static predictor that always predict branch in its more probable direction. So upper bound of hitrate for static branch prediction methods is about $ 91\%$, but it is unrealistic to expect this to be reached without profile feedback.

We would like also to point out that many of our new heuristics are very accurate--for instance the return value based ones. Also splitting single opcode heuristic to multiple ones helped us to increase the hitrate of ``value positive'' heuristic.

Figure: Comparison of perfect and static branch prediction on SPECint2000 tests.
\includegraphics[width=129mm]{results3.ps}

Finally figure [*] shows comparisons of hitrates of our combined heuristics and perfect predictor on each of SPECint2000 benchmarks. As can be easily seen, the sucesfulness of our method varies from program to program, but in all cases it has at least slightly successful.

We found that some of optimizers needs to be modified for properties of estimated profiles. For instance estimated profile almost always predict loop to have relatively few iterations making loop peeling too active, but overall almost all optimizations implemented for profile feedback derive some benefit from estimated profile too as shown in next chapter.

Jan Hubicka 2003-05-04