Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Using Automatic Parallelization and OpenMP

Restricting the Threads That Execute a Region of Code

There are situations where it is necessary to restrict the number of threads that can execute a block of code.

Restricting the Threads That Execute a Region of Code

There are situations where it is necessary to restrict the number of threads that can execute a block of code. For example, there might be algorithmic reasons where only a sin-gle thread should execute a region of code. Alternatively, for correctness, it may be necessary to ensure that only a single thread at a time executes a region of code. It may also be necessary to restrict the number of threads executing a parallel region if there is insufficient work for more threads to complete. This section describes multiple ways that the number of threads can be restricted.

Executing a Region of Code Using a Single Thread

We met the single directive in the section “Using OpenMP for Dynamically Defined Parallel Tasks.” The single directive specifies that only one thread will execute the code in the region. All the other threads will wait for this code to be executed before contin-uing. The nowait clause can be used if the other threads should continue execution before the single thread completes execution. For an example of the single directive, see Listing 7.40.

Allowing Only the Master Thread to Execute a Region of Code

The master directive is similar to the single directive in that it specifies only one thread should execute the enclosed region. There are two differences between the direc-tives. The first is that it identifies that the master thread is the one that will do the work; the single directive does not specify which thread will perform the work. The second difference is that the master directive does not cause the other threads to wait for the work in the region to be completed before they continue.

The master directive is useful in situations where only one thread needs to complete the work. It ensures that the same thread always executes the region of code, so any thread-local variables will carry over from previous executions. This can be useful for broadcasting and sharing the value of variables between threads. An example of the master directive can be seen in Listing 7.50.

Restricting Execution of a Region of Code to a Single Thread

For correctness, it is sometimes necessary to restrict a region of code so that it is executed only by a single thread at a time. This can be achieved using the critical directive. Listing 7.53 shows a very inefficient way of performing a reduction using a critical directive to ensure that only one thread changes the reduction variable at any time.

Listing 7.53 Reduction Operation Implemented Using a critical Directive

double calc( double* array, int length )

{

double total = 0.0;

#pragma omp parallel for

for ( int i=0; i<length; i++ )

{

#pragma omp critical

{

total += array[i];

}

return total;

}

The critical directive takes an optional name. This enables the same critical section to protect multiple regions of code. For example, all accesses to the variable total could be protected by a critical section of the name total_critical_section, as shown in Listing 7.54.

Listing 7.54 Named Critical Section

#pragma omp critical( total_critical_section )

{

total += array[i];

}

Performing Operations Atomically

Sometimes, all that is necessary is the atomic modification of a variable. OpenMP sup-ports this through the atomic directive that applies only to the following modification of a variable. Listing 7.55 shows how the reduction could be coded using an atomic directive. The atomic directive ensures correct behavior but may not be any faster than using a critical section.

Listing 7.55 Reduction Implemented Using an Atomic Directive

double calc( double* array, int length )

{

double total = 0.0;

#pragma omp parallel for

for ( int i=0; i<length; i++ )

{

#pragma omp atomic total += array[i];

}

return total;

}

Using Mutex Locks

OpenMP also supports the flexibility offered by mutex locks, which are supported through OpenMP locks. A lock is declared to be of the type omp_lock_t and initialized through call to omp_init_lock(). The lock is destroyed with a call to omp_destroy_lock(). To acquire the lock, the code calls omp_set_lock(), and to release the lock, the code calls omp_unset_lock(). The code can test whether the lock is available by calling omp_test_lock(). It is possible to rewrite the reduction code to use OpenMP locks, as shown in Listing 7.56.

Listing 7.56 Reduction Implemented Using an OpenMP Lock

#include <omp.h>

omp_lock_t lock;

double calc( double* array, int length )

{

double total = 0.0;

#pragma omp parallel for

for ( int i=0; i<length; i++ )

{

omp_set_lock( &lock ); total += array[i]; omp_unset_lock( &lock );

}

return total;

}

int main()

{

double array[1024];

omp_init_lock( &lock );

calc( array, 1024 );

omp_destroy_lock( &lock );

}

Conditional Serial Execution of Parallel Regions

In some instances, it can be useful to identify conditions when a parallel region should be executed by a single thread. This saves having to place both a serial version and a par-allel version of the block of code in the source of the application.

The most obvious occasion for doing this would be when there is insufficient work to justify using more than one thread. The if() clause can be applied to a parallel directive to determine the conditions when the region should be executed in parallel. Listing 7.57 shows an example of using this directive. The code will execute the region using multiple threads only if the variable length has a value greater than 1,000.

Listing 7.57 Conditional Parallel Execution Using the if Clause

double calc( double * array, int length )

{

double total = 0.0;

#pragma omp parallel for reduction( +: total ) if( length > 1000 ) for ( int i=0; i<length; i++ )

{

total += array[i];

}

return total;

}

Another use for the if() clause would be in situations where using multiple threads to execute a region of code would cause correctness issues. For example, if a loop calcu-lates some function of two vectors, the code is sometimes called with vectors that alias.

The if() clause can be used to check whether the vectors alias and execute the code in parallel only if no aliasing is present.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Multicore Application Programming For Windows, Linux, and Oracle Solaris : Using Automatic Parallelization and OpenMP : Restricting the Threads That Execute a Region of Code |