Parallelization of Codes Containing Calls
We discussed the impact made on performance by calls to other routines
in Chapter 2, “Coding for Performance.” The basic problem with calling another
function is that the compiler has no idea what that routine might do—it could
change global data or per-haps never return. For this reason, a loop that
contains function calls cannot, in general, be automatically parallelized.
Obviously, this restriction would preclude a large
number of loops that could other-wise be safely parallelized. The most obvious
place where this would be a problem would be in calling mathematical functions.
This limitation can be demonstrated using the modified version of the
matrix-vector code from Listing 7.9. Listing 7.13 shows this modified code.
Listing 7.13 Modified
Matrix-Vector Code That Makes a Function Call
#include <math.h>
void matVec( double **mat, double *vec, double *
restrict out, int row, int col )
int i,j;
for ( i=0; i<row; i++ ) //
Line 7
for ( j=0; j<col; j++ ) //
Line 10
out[i] += sin( mat[i][j] * vec[j] );
When compiled, the call to sin() causes automatic parallelization to fail, as shown in Listing 7.14.
Listing 7.14 Automatic
Parallelization Failing in the Presence of a Function Call
$ cc -g -xautopar -xloopinfo -O
-c fploops.c
"fploops.c", line 7: not parallelized, call may be unsafe
"fploops.c", line 10: not parallelized, call may be unsafe
The Solaris Studio compiler considers sin() to be a “built-in” function, but because a developer might provide an
alternative implementation or perhaps interpose on the function calls, it does
not recognize these calls unless specifically told to do so. The flag to enable
recognition of built-in functions is -xbuiltin. When this flag is provided, the output from the compiler is shown in
Listing 7.15.
Listing 7.15 Automatic
Parallelization Recognizing Call to sin()
as Safe
$ cc -g -xbuiltin -xautopar
-xloopinfo -O -c fploops.c
"fploops.c", line 7: PARALLELIZED, and serial version generated
line 10: not parallelized, unsafe dependence
However, calls to mathematical functions represent
a small proportion of the calls that might be encountered in loops. There is no
standard way to denote that a call to a par-ticular function can be safely made
in parallel, although individual compilers might implement mechanisms that
could be used. The best way to enable a loop containing a function call to be
parallelized automatically is by inlining the function. Inlining replaces a
call to a function with the actual code for the called function. Function
inlining can be enabled with a general compiler flag or a flag enabling a
specific routine to be inlined. Listing 7.16 shows a variant of the code where
part of the calculation is performed by a routine.
Listing 7.16 Code
Where Part of the Calculation Is Performed by Another Function
calc( double a, double b )
return a * b;
void matVec( double **mat, double *vec, double *
restrict out, int row, int col )
int i,j;
for ( i=0; i<row; i++ ) //
Line 12
for ( j=0; j<col; j++ ) //
Line 15
out[i] += calc( mat[i][j], vec[j] );
When this code is compiled, the compiler fails to
automatically parallelize the loops because they contain a call that may be
unsafe. However, when the code is compiled at an optimization level of -xO4 or higher, the compiler automatically performs inlining optimizations,
which eliminates the call and allows the loop to be parallelized. This is shown
in Listing 7.17.
Listing 7.17 Inlining
Enables the Compiler to Automatically Parallelize Loop
cc -g -xautopar -xloopinfo -xO4 -c
line 12: PARALLELIZED, and serial version generated
line 15: not parallelized, unsafe dependence
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023; All Rights Reserved. Developed by Therithal info, Chennai.