How
Cross-File Optimization Can Be Used to Improve Performance
We have
already discussed how the source structure of an application can impact the
performance of an application. In Figure 2.8, function A() calls function B(), but func-tion B() is defined in the file b.c, and function A() is defined in the file a.c.
There are
a number of costs to making this call:
n There
will be a branch and return instruction to make the call.
n Registers
might be stored to memory before the call and restored from memory after the
call because the called routine might use or modify the variables that they
currently hold.
n Registers
might be spilled to memory to provide empty registers for the called routine to
use.
n Both
routine A() and
routine B() might
perform computations that could be identified as unnecessary if the source for
the combination of the two routines were evaluated.
One way
to overcome these limitations is by using cross-file optimization. This is
typ-ically a final step after the compiler has produced object files for all
the source files in an application. At this step, the compiler reads all the
object files and looks for optimizations it can perform using full knowledge of
the entire application. For inlining, the compiler will determine that there is
a call from A() to B() and rewrite routine A() with a new version that combines
the code from A() with the
code from B(). This
new version is the one that appears in the final executable.
Inlining
is a very good optimization to enable because it should have no impact on the
correctness of the application (the executed code should be equivalent to the
origi-nal code), but it reduces the execution costs and also introduces further
opportunities for optimization. Listing 2.39 shows code with an opportunity for
an inlining optimization.
Listing 2.39 Code
with an Inlining Opportunity
int B( int p, int q )
{
if ( q == 1 )
{
return p;
}
else
{
return p * B( p, q-1 );
}
}
int A( int p )
{
return B( p, 1 );
}
In this example, the function B() is an inefficient way of calculating p^q. However, it is called by
routine A() with the value of q as a constant 1, so the return value of the function will always be the
value of the variable p. With inlining, the compiler can
choose to inline function B() into function A(), and it will discover that q is always 1 for this call and can eliminate both the conditional code
and the untaken recursive branch of the conditional code. In fact, the whole of
routine A will collapse down to a
statement that returns the value of the variable p, as shown in Listing 2.40.
Listing 2.40 Code
After Inlining Optimization
int
A( int p )
{
return p;
}
This new version of the routine A() is also a very good candidate for inlining since it only returns the
value of the variable passed into it. Although this might appear to be an
unlikely example, there is a more generally occurring code pattern, as shown in
Listing 2.41.
Listing 2.41 Accessor
Pattern
static
int count;
int
get_count()
{
return count;
}
It is very common to have routines that exist only
to get and set the value of vari-ables. These routines are very strong
candidates for inlining since they contribute only one useful instruction (the
load of the variable) and at least two overhead instructions (the call and
return).
Another situation where inlining improves
performance is where it can eliminate loads and stores of variables to memory.
Listing 2.42 shows code where inlining will reduce the number of memory
operations.
Listing 2.42 Code
with Potential for Optimization by Function Inlining
int number_of_elements; int max;
void
calculate_max(int* elements)
{
max=elements[0];
for (int i=1; i<number_of_elements; i++)
{
if ( elements[i] > max )
{
max=elements[i];
}
}
}
void doWork()
{
….
number_of_elements = ….; calculate_max(elements);
….
}
The routine calculate_max() needs
the variable number_of_elements to be
updated before it is called. In the general case, the compiler needs to store
all visible variables to memory before calling the routine. This is necessary
in case the routine reads any of the variables. The variables need to be
reloaded after the call in case the routine has modified any of them. After
inlining, the compiler does not need to include these loads and stores because
it can hold the necessary values in registers and execute only the loads and
stores that are necessary.
Cross-file optimization has a benefit in that it enables the compiler to
generate opti-mal code regardless of how the source code is distributed between
source files. The only limitation involves static or dynamic libraries, in
which case the compiler may not be able to perform the necessary cross-file
inlining.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.