Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Using Automatic Parallelization and OpenMP

Assisting Compiler in Automatically Parallelizing Code

The code shown in Listing 7.18 has a potential aliasing issue. Changes to the elements in the array myarray might also change the value pointed to by length if it happens to be a member of myarray.

Assisting Compiler in Automatically Parallelizing Code

The code shown in Listing 7.18 has a potential aliasing issue. Changes to the elements in the array myarray might also change the value pointed to by length if it happens to be a member of myarray.

Listing 7.18 Incrementing All the Values Held in an Array of Numbers

void sum( double* myarray, int *length )

{

int i;

for ( i=0; i<*length; i++ )

{

myarray[i] += 1;

}

It is possible to modify the code so that a compiler can automatically parallelize it. One way to resolve this would be to specify that length was a restrict-qualified pointer so that the compiler would know that stores to array would not alter the value pointed to by length. Another approach would be to place the value pointed to by length into a temporary variable. This second approach has an advantage in that it does not rely on support for the restrict keyword.

In many situations, the compiler will be able to parallelize loops if some of the poten-tial aliasing issues are resolved using temporary variables or type casting using restrict-qualified pointers. The code shown in Listing 7.19 exhibits a number of potential aliasing issues.

Listing 7.19 Code That Passes Data Using Structures

typedef struct s

{

int length;

double *array1, *array2; } S;

void calc( S *data )

{

int i;

for ( i=0; i < data->length; i++ ) // Line 10

{

data->array1[i] += data->array2[i]; // Line 12

}

The first issue that the compiler finds is that it fails to recognize the loop at line 10 as one that can be parallelized. This is because changes to data->array1 might change the value of the variable data->length. The problem is that the compiler cannot know how many iterations of the loop will be performed, so it cannot divide those iterations between multiple threads. This issue can be resolved by taking a local copy of the vari-able data->length and using that as the loop iteration limit.

This converts the loop into one that can be recognized by the compiler, but the com-piler is still unable to parallelize it because there is potential aliasing between reads from data->array2 and writes to data->array1. This issue can be resolving by making local restrict-qualified pointers that point to the two arrays. Listing 7.20 shows the modified source.

Listing 7.20 Modified Code That Passes Data Using Structures

typedef struct s

{

int length;

double * array1, *array2; } S;

void calc( S *data )

{

int i;

int length = data->length;

double * restrict array1 = data->array1;

double * restrict array2 = data->array2;

for (i=0; i < length; i++)

{

array1[i] += array2[i];

}

In some instances, the compiler may be able to use versioning of the loop to automati-cally parallelize code similar to that in Listing 7.19. The compiler produces multiple ver-sions of the loop, and the appropriate version is selected at runtime. A serial version of the loop is used when there is aliasing between stores to memory in the loop and vari-ables used by the loop. In the code in Listing 7.19, the stores to data->array1 might alias with data->array2, data->length, or the structure pointed to by data. A paral-lel version is generated for use when there is no such aliasing.

The techniques to improve the chance that a compiler can automatically parallelize an application can be summarized as follows:

By default, most compilers will assume that all pointers may alias. This can be resolved by making local copies of invariant data, by specifying a stronger aliasing assumption, or by declaring pointers with the restrict keyword.

ⁿThe compiler may require additional flags for it to produce parallel versions of all loops. This may be a flag to give it permission to perform parallelization of reduc-tions, such as the -xreduction flag needed by the Solaris Studio compiler.

Alternatively, it may be a flag that alters the threshold at which the compiler will consider a loop profitable to parallelize. For example, the Intel compiler has the -par-threshold0 flag. Finally, there may be additional flags for the compiler to recognize loops containing calls to intrinsic functions as being safe to parallelize; the Solaris Studio compiler requires the -xbuiltin flag for this purpose.

ⁿCompilers cannot parallelize loops containing calls to functions unless they are certain that the function calls are without side effects. In some cases, there may be compiler directives that can be placed into the source code of the application to provide this assertion. In other cases, it may be possible to force the compiler to inline the function, which would then enable it to parallelize the resulting loop.

From this section, it should be apparent that compilers are able to automatically extract some parallelism from a subset of applications. The size of the subset can be increased using the feedback provided by the compiler and some of the techniques described here. However, the ability of current compilers to perform automatic parallelization is limited, and some of the source code changes proposed here may reduce the clarity of the source code.

Alternatively, the OpenMP API provides a way to expose the parallelism in a code by making minimal changes to the source code. With most compilers, it can be used in addition to automatic parallelization so that more of the application can be parallelized.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Multicore Application Programming For Windows, Linux, and Oracle Solaris : Using Automatic Parallelization and OpenMP : Assisting Compiler in Automatically Parallelizing Code |