Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Coding for Performance

How Cross-File Optimization Can Be Used to Improve Performance

We have already discussed how the source structure of an application can impact the performance of an application.

How Cross-File Optimization Can Be Used to Improve Performance

We have already discussed how the source structure of an application can impact the performance of an application. In Figure 2.8, function A() calls function B(), but func-tion B() is defined in the file b.c, and function A() is defined in the file a.c.

There are a number of costs to making this call:

ⁿThere will be a branch and return instruction to make the call.

ⁿRegisters might be stored to memory before the call and restored from memory after the call because the called routine might use or modify the variables that they currently hold.

ⁿRegisters might be spilled to memory to provide empty registers for the called routine to use.

ⁿBoth routine A() and routine B() might perform computations that could be identified as unnecessary if the source for the combination of the two routines were evaluated.

One way to overcome these limitations is by using cross-file optimization. This is typ-ically a final step after the compiler has produced object files for all the source files in an application. At this step, the compiler reads all the object files and looks for optimizations it can perform using full knowledge of the entire application. For inlining, the compiler will determine that there is a call from A() to B() and rewrite routine A() with a new version that combines the code from A() with the code from B(). This new version is the one that appears in the final executable.

Inlining is a very good optimization to enable because it should have no impact on the correctness of the application (the executed code should be equivalent to the origi-nal code), but it reduces the execution costs and also introduces further opportunities for optimization. Listing 2.39 shows code with an opportunity for an inlining optimization.

Listing 2.39 Code with an Inlining Opportunity

int B( int p, int q )

{

if ( q == 1 )

{

return p;

}

else

{

return p * B( p, q-1 );

}

int A( int p )

{

return B( p, 1 );

}

In this example, the function B() is an inefficient way of calculating p^q. However, it is called by routine A() with the value of q as a constant 1, so the return value of the function will always be the value of the variable p. With inlining, the compiler can choose to inline function B() into function A(), and it will discover that q is always 1 for this call and can eliminate both the conditional code and the untaken recursive branch of the conditional code. In fact, the whole of routine A will collapse down to a statement that returns the value of the variable p, as shown in Listing 2.40.

Listing 2.40 Code After Inlining Optimization

int A( int p )

{

return p;

}

This new version of the routine A() is also a very good candidate for inlining since it only returns the value of the variable passed into it. Although this might appear to be an unlikely example, there is a more generally occurring code pattern, as shown in Listing 2.41.

Listing 2.41 Accessor Pattern

static int count;

int get_count()

{

return count;

}

It is very common to have routines that exist only to get and set the value of vari-ables. These routines are very strong candidates for inlining since they contribute only one useful instruction (the load of the variable) and at least two overhead instructions (the call and return).

Another situation where inlining improves performance is where it can eliminate loads and stores of variables to memory. Listing 2.42 shows code where inlining will reduce the number of memory operations.

Listing 2.42 Code with Potential for Optimization by Function Inlining

int number_of_elements; int max;

void calculate_max(int* elements)

{

max=elements[0];

for (int i=1; i<number_of_elements; i++)

{

if ( elements[i] > max )

{

max=elements[i];

}

void doWork()

{

….

number_of_elements = ….; calculate_max(elements); ….

}

The routine calculate_max() needs the variable number_of_elements to be updated before it is called. In the general case, the compiler needs to store all visible variables to memory before calling the routine. This is necessary in case the routine reads any of the variables. The variables need to be reloaded after the call in case the routine has modified any of them. After inlining, the compiler does not need to include these loads and stores because it can hold the necessary values in registers and execute only the loads and stores that are necessary.

Cross-file optimization has a benefit in that it enables the compiler to generate opti-mal code regardless of how the source code is distributed between source files. The only limitation involves static or dynamic libraries, in which case the compiler may not be able to perform the necessary cross-file inlining.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Multicore Application Programming For Windows, Linux, and Oracle Solaris : Coding for Performance : How Cross-File Optimization Can Be Used to Improve Performance |