Assisting
Compiler in Automatically Parallelizing Code
The code shown in Listing 7.18 has a potential aliasing issue. Changes
to the elements in the array myarray might
also change the value pointed to by length if it happens to be a member of myarray.
Listing 7.18 Incrementing
All the Values Held in an Array of Numbers
void sum( double* myarray, int *length )
{
int i;
for ( i=0; i<*length; i++ )
{
myarray[i] += 1;
}
}
It is possible to modify the code so that a
compiler can automatically parallelize it. One way to resolve this would be to
specify that length was a restrict-qualified pointer so that the compiler would know that stores to array would not alter the value pointed to by length. Another approach would be to place the value pointed to by length
into a temporary variable. This second approach has
an advantage in that it does not rely on support for the restrict keyword.
In many situations, the compiler will be able to
parallelize loops if some of the poten-tial aliasing issues are resolved using
temporary variables or type casting using restrict-qualified pointers. The code shown in Listing 7.19 exhibits a number of
potential aliasing issues.
Listing 7.19 Code
That Passes Data Using Structures
typedef struct s
{
int length;
double *array1, *array2; } S;
void calc( S *data )
{
int i;
for ( i=0; i < data->length; i++ ) //
Line 10
{
data->array1[i] += data->array2[i]; // Line 12
}
}
The first issue that the compiler finds is that it
fails to recognize the loop at line 10 as one that can be parallelized. This is
because changes to data->array1 might
change the value of the variable data->length. The problem is that the compiler cannot know how many iterations of
the loop will be performed, so it cannot divide those iterations between
multiple threads. This issue can be resolved by taking a local copy of the
vari-able data->length and
using that as the loop iteration limit.
This converts the loop into one that can be
recognized by the compiler, but the com-piler is still unable to parallelize it
because there is potential aliasing between reads from data->array2
and writes to
data->array1. This issue can be resolving by
making local restrict-qualified pointers that point to the two arrays. Listing 7.20 shows the
modified source.
Listing 7.20 Modified
Code That Passes Data Using Structures
typedef
struct s
{
int length;
double * array1, *array2; } S;
void
calc( S *data )
{
int i;
int length =
data->length;
double *
restrict array1 = data->array1;
double *
restrict array2 = data->array2;
for (i=0; i < length; i++)
{
array1[i]
+= array2[i];
}
}
In some
instances, the compiler may be able to use versioning
of the loop to automati-cally parallelize code similar to that in Listing 7.19.
The compiler produces multiple ver-sions of the loop, and the appropriate
version is selected at runtime. A serial version of the loop is used when there
is aliasing between stores to memory in the loop and vari-ables used by the
loop. In the code in Listing 7.19, the stores to data->array1 might alias with data->array2, data->length, or the
structure pointed to by data. A paral-lel version is generated for use when there is no such
aliasing.
The
techniques to improve the chance that a compiler can automatically parallelize
an application can be summarized as follows:
By
default, most compilers will assume that all pointers may alias. This can be
resolved by making local copies of invariant data, by specifying a stronger
aliasing assumption, or by declaring pointers with the restrict keyword.
n The
compiler may require additional flags for it to produce parallel versions of
all loops. This may be a flag to give it permission to perform parallelization
of reduc-tions, such as the -xreduction flag needed by the Solaris
Studio compiler.
Alternatively, it may be a flag that alters the threshold at which the compiler will consider a loop profitable to parallelize. For example, the Intel compiler has the -par-threshold0 flag. Finally, there may be additional flags for the compiler to recognize loops containing calls to intrinsic functions as being safe to parallelize; the Solaris Studio compiler requires the -xbuiltin flag for this purpose.
n Compilers
cannot parallelize loops containing calls to functions unless they are certain
that the function calls are without side effects. In some cases, there may be
compiler directives that can be placed into the source code of the application
to provide this assertion. In other cases, it may be possible to force the
compiler to inline the function, which would then enable it to parallelize the
resulting loop.
From this section, it should be apparent that
compilers are able to automatically extract some parallelism from a subset of
applications. The size of the subset can be increased using the feedback
provided by the compiler and some of the techniques described here. However,
the ability of current compilers to perform automatic parallelization is
limited, and some of the source code changes proposed here may reduce the
clarity of the source code.
Alternatively, the OpenMP API provides a way to
expose the parallelism in a code by making minimal changes to the source code.
With most compilers, it can be used in addition to automatic parallelization so
that more of the application can be parallelized.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.