Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Coding for Performance

Using Profile Feedback

Most compilers support profile feedback, which is a mechanism that enables the compiler to gather information about the runtime behavior of the application.

Using Profile Feedback

Most compilers support profile feedback, which is a mechanism that enables the compiler to gather information about the runtime behavior of the application. Consider the snippet of code shown in Listing 2.43.

Listing 2.43 Code Where the Runtime Behavior of the Code Is Uncertain

if ( a != 0 )

{ d++; } else

{ d--; }

In this situation, the compiler has no idea whether the general case is to increment or decrement the variable d. The usual solution is for the compiler to either guess one is more likely than the other or produce code that favors neither assumption. However, if the code is in the frequently executed part of the application, the appropriate choice may lead to an observable improvement in performance.

Another case where knowledge of the runtime behavior of the application is useful is in determining which routines to inline. As discussed in the previous section, picking the correct routine to inline can lead to significant performance benefits. However, every time a called routine gets inlined, it increases the number of instructions in the calling routine. This code size increase is likely to cause the instruction cache to be less effi-ciently utilized, leading to a drop in performance. Hence, it can be quite important to inline routines that will benefit performance and avoid inlining those that will only increase the instruction cache footprint.

Profile feedback, or feedback-directed optimization, allows the compiler the opportu-nity to gather runtime information on the behavior of the application. It is a three-step process. The first step is to build an instrumented version of the application to collect runtime metrics. The next step is to run this application on a data set, which is “typical” of the one that the application will really run on but whose runtime is much shorter. The final step is to recompile the application using this profile information. Listing 2.44 shows the steps using the Solaris Studio compiler.

Listing 2.44 Steps for Using Profile Feedback with the Sun Studio Compiler

$ cc -O -xprofile=collect:./profile -o a.out prog.c

$ a.out

$ cc -O -xprofile=use:./profile -o a.out prog.c

The benefit of profile feedback depends on the application. Some applications will see no benefit, while some may see a significant gain. As outlined earlier, the gains typically come from either getting the compiler to lay out a performance-critical section of code in an optimal way or inlining a performance-critical routine.

It is interesting to observe that profile feedback tends to give the greatest benefit to codes where there are lots of branches or calls rather than codes where there are a lot of loops. The compiler can predict that loops will be iterated many times but has a harder job correctly guessing for codes where there are plenty of control flow instructions. Codes that have significant control flow instructions also tend to have few instructions between control flow, so there are not many opportunities for the compiler to extract performance in other ways. Hence, profile feedback can be the most effective way of improving performance in a class of codes that is otherwise hard to optimize.

There are two concerns with using profile feedback. The first is that using profile feedback complicates the build process and increases its duration. This can be controlled by using profile feedback only on the release builds and not as part of the regular developer builds. It can also be managed by ensuring that the build process is as efficient as possible. For instance, the build process can be parallelized so that it takes advantage of multiple cores. The other concern is that using profile feedback optimizes the application for one particular scenario at the expense of the performance in other scenarios. This is the zero-sum view of performance; a gain on one workload has to be compensated by a loss of performance in another. In general, this concern is misplaced. Profile feedback helps the compiler make decisions about the frequently executed paths and frequently called func-tions. In most instances, the behavior of the application is only weakly dependent on the input data set. For example, the same routines get called (although with a different fre-quency), the same branches get taken, and so forth. This does not mean that every con-trol transfer instruction has the same profile, but the majority of the control transfers in the code have the same direction.

The exception is an application that has different “modes”: explicit modes where the application is requested to perform different tasks or implicit modes where some charac-teristic of the input data causes the application to behave in a particular way.

An explicit mode might appear in the code as a switch/case statement that calls entirely different code sections depending on an input condition. An implicit mode might be an application that has multiple ways of solving a problem, and the problem-solving approach used at each stage in the solution depends on the results of the previ-ous steps.

If the application has modes of operation, then it is necessary to provide training inputs that capture all the different modes of operation. The profile of the application and the code coverage data for the particular training data used provide the best indica-tion of whether the application has these modes. Input data sets that do not cover signif-icant parts of the code base are a strong indicator for the existence of these modes and definitely indicate that more input data sets should be used in providing training data for the application’s build.

The performance benefit from compiling with profile feedback is variable. Codes where the time is spent in loops tend to benefit less from profile feedback, whereas codes containing high numbers of control transfer instructions tend to see a much greater ben-efit. The typical gain is probably around 5% to 10%, but gains can be much greater if the profile feedback happens to lead to other opportunities for further performance gains. The developer’s choice to use profile feedback should be taken in light of whether using it gets performance gains.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Multicore Application Programming For Windows, Linux, and Oracle Solaris : Coding for Performance : Using Profile Feedback |