Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Using Automatic Parallelization and OpenMP

Automatic Parallelization of Codes Containing Calls

The basic problem with calling another function is that the compiler has no idea what that routine might do—it could change global data or per-haps never return.

Automatic Parallelization of Codes Containing Calls

We discussed the impact made on performance by calls to other routines in Chapter 2, “Coding for Performance.” The basic problem with calling another function is that the compiler has no idea what that routine might do—it could change global data or per-haps never return. For this reason, a loop that contains function calls cannot, in general, be automatically parallelized.

Obviously, this restriction would preclude a large number of loops that could other-wise be safely parallelized. The most obvious place where this would be a problem would be in calling mathematical functions. This limitation can be demonstrated using the modified version of the matrix-vector code from Listing 7.9. Listing 7.13 shows this modified code.

Listing 7.13 Modified Matrix-Vector Code That Makes a Function Call

#include <math.h>

void matVec( double **mat, double *vec, double * restrict out, int row, int col )

{

int i,j;

for ( i=0; i<row; i++ ) // Line 7

{

out[i]=0;

for ( j=0; j<col; j++ ) // Line 10

{

out[i] += sin( mat[i][j] * vec[j] );

}

When compiled, the call to sin() causes automatic parallelization to fail, as shown in Listing 7.14.

Listing 7.14 Automatic Parallelization Failing in the Presence of a Function Call

$ cc -g -xautopar -xloopinfo -O -c fploops.c

"fploops.c", line 7: not parallelized, call may be unsafe

"fploops.c", line 10: not parallelized, call may be unsafe

The Solaris Studio compiler considers sin() to be a “built-in” function, but because a developer might provide an alternative implementation or perhaps interpose on the function calls, it does not recognize these calls unless specifically told to do so. The flag to enable recognition of built-in functions is -xbuiltin. When this flag is provided, the output from the compiler is shown in Listing 7.15.

Listing 7.15 Automatic Parallelization Recognizing Call to sin() as Safe

$ cc -g -xbuiltin -xautopar -xloopinfo -O -c fploops.c

"fploops.c", line 7: PARALLELIZED, and serial version generated

"fploops.c", line 10: not parallelized, unsafe dependence

However, calls to mathematical functions represent a small proportion of the calls that might be encountered in loops. There is no standard way to denote that a call to a par-ticular function can be safely made in parallel, although individual compilers might implement mechanisms that could be used. The best way to enable a loop containing a function call to be parallelized automatically is by inlining the function. Inlining replaces a call to a function with the actual code for the called function. Function inlining can be enabled with a general compiler flag or a flag enabling a specific routine to be inlined. Listing 7.16 shows a variant of the code where part of the calculation is performed by a routine.

Listing 7.16 Code Where Part of the Calculation Is Performed by Another Function

#include <math.h>

double calc( double a, double b )

{

return a * b;

}

void matVec( double **mat, double *vec, double * restrict out, int row, int col )

{

int i,j;

for ( i=0; i<row; i++ ) // Line 12

{

out[i]=0;

for ( j=0; j<col; j++ ) // Line 15

{

out[i] += calc( mat[i][j], vec[j] );

}

When this code is compiled, the compiler fails to automatically parallelize the loops because they contain a call that may be unsafe. However, when the code is compiled at an optimization level of -xO4 or higher, the compiler automatically performs inlining optimizations, which eliminates the call and allows the loop to be parallelized. This is shown in Listing 7.17.

Listing 7.17 Inlining Enables the Compiler to Automatically Parallelize Loop

$ cc -g -xautopar -xloopinfo -xO4 -c fploops.c

"fploop.c", line 12: PARALLELIZED, and serial version generated

"fploop.c", line 15: not parallelized, unsafe dependence

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Multicore Application Programming For Windows, Linux, and Oracle Solaris : Using Automatic Parallelization and OpenMP : Automatic Parallelization of Codes Containing Calls |