Chapter: An Introduction to Parallel Programming : Parallel Program Development

Parallelizing the tree-search programs using OpenMP

The issues involved in implementing the static and dynamic parallel tree-search programs using OpenMP are the same as the issues involved in implementing the programs using Pthreads.

Parallelizing the tree-search programs using OpenMP

The issues involved in implementing the static and dynamic parallel tree-search programs using OpenMP are the same as the issues involved in implementing the programs using Pthreads.

There are almost no substantive differences between a static implementation that uses OpenMP and one that uses Pthreads. However, a couple of points should be mentioned:

1. When a single thread executes some code in the Pthreads version, the test

if (my_rank == whatever)

can be replaced by the OpenMP directive

# pragma omp single

This will insure that the following structured block of code will be executed by one thread in the team, and the other threads in the team will wait in an implicit barrier at the end of the block until the executing thread is finished.

When whatever is 0 (as it is in each test in the Pthreads program), the test can also be replaced by the OpenMP directive

# pragma omp master

This will insure that thread 0 executes the following structured block of code. However, the master directive doesn’t put an implicit barrier at the end of the block, so it may be necessary to also add a barrier directive after a structured block that has been modified by a master directive.

2. The Pthreads mutex that protects the best tour can be replaced by a single critical directive placed either inside the Update best tour function or imme-diately before the call to Update best tour. This is the only potential source of a race condition after the distribution of the initial tours, so the simple critical directive won’t cause a thread to block unnecessarily.

The dynamically load-balanced Pthreads implementation depends heavily on Pthreads condition variables, and OpenMP doesn’t provide a comparable object. The rest of the Pthreads code can be easily converted to OpenMP. In fact, OpenMP even provides a nonblocking version of omp_set_lock. Recall that OpenMP provides a lock object omp_lock_t and the following functions for acquiring and relinquishing the lock, respectively:

void omp_set_lock(omp_lock_t* lock p /* in/out */);

void omp_unset_lock(omp_lock_t* lock p /* in/out */);

It also provides the function

int omp_test_lock(omp_lock_t* lock p /* in/out */);

which is analogous to pthread mutex trylock; it attempts to acquire the lock lock p, and if it succeeds it returns true (or nonzero). If the lock is being used by some other thread, it returns immediately with return value false (or zero).

If we examine the pseudocode for the Pthreads Terminated function in Program 6.8, we see that in order to adapt the Pthreads version to OpenMP, we need to emulate the functionality of the Pthreads function calls

Pthread_cond_signal(&term_cond_var);

pthread_cond_broadcast(&term_cond_var);

pthread_cond_wait(&term_cond_var, &term_mutex);

in Lines 6, 17, and 22, respectively.

Recall that a thread that has entered the condition wait by calling

pthread_cond_wait(&term_cond_var, &term_mutex);

is waiting for either of two events:

.. Another thread has split its stack and created work for the waiting thread. All of the threads have run out of work.

Perhaps the simplest solution to emulating a condition wait in OpenMP is to use busy-waiting. Since there are two conditions a waiting thread should test for, we can use two different variables in the busy-wait loop:

/* Global variables */

int awakened thread = 1;

int work remains = 1; /* true */

. . .

while (awakened thread != my rank && work remains);

Initialization of the two variables is crucial: If awakened thread has the value of some thread’s rank, that thread will exit immediately from the while, but there may be no work available. Similarly, if work remains is initialized to 0, all the threads will exit the while loop immediately and quit.

Now recall that when a thread enters a Pthreads condition wait, it relinquishes the mutex associated with the condition variable so that another thread can also enter the condition wait or signal the waiting thread. Thus, we should relinquish the lock used in the Terminated function before starting the while loop.

Also recall that when a thread returns from a Pthreads condition wait, it reacquires the mutex associated with the condition variable. This is especially important in this setting since if the awakened thread has received work, it will need to access the shared data structures storing the new stack. Thus, our complete emulated condition wait should look something like this:

/* Global vars */

int awakened_thread = 1;

work_remains = 1; /* true */

omp_unset_lock(&term_lock);

while (awakened_thread != my_rank && work_remains);

omp_set_lock(&term_lock);

If you recall the discussion of busy-waiting in Section 4.5 and Exercise 4.3 of Chapter 4, you may be concerned about the possibility that the compiler might reorder the code around the busy-wait loop. The compiler should not reorder across calls to omp set lock or omp unset lock. However, the updates to the variables could be reordered, so if we’re going to be using compiler optimization, we should declare both with the volatile keyword.

Emulating the condition broadcast is straightforward: When a thread determines that there’s no work left (Line 14 in Program 6.8), then the condition broadcast (Line 17) can be replaced with the assignment

work_remains = 0; /* Assign false to work remains */

The “awakened” threads can check if they were awakened by some thread’s setting work remains to false, and, if they were, return from Terminated with the value true.

Emulating the condition signal requires a little more work. The thread that has split its stack needs to choose one of the sleeping threads and set the variable awakened thread to the chosen thread’s rank. Thus, at a minimum, we need to keep a list of the ranks of the sleeping threads. A simple way to do this is to use a shared queue of thread ranks. When a thread runs out of work, it enqueues its rank before entering the busy-wait loop. When a thread splits its stack, it can choose the thread to awaken by dequeuing the queue of waiting threads:

got_lock = omp test lock(&term lock); if (got lock != 0) {

if (waiting threads > 0 && new stack == NULL) {

Split my stack creating new stack;

awakened thread = Dequeue(term queue);

}

omp unset lock(&term lock);

}

The awakened thread needs to reset awakened thread to - 1 before it returns from its call to the Terminated function.

Note that there is no danger that some other thread will be awakened before the awakened thread reacquires the lock. As long as new stack is not NULL, no thread will attempt to split its stack, and hence no thread will try to awaken another thread. So if several threads call Terminated before the awakened thread reacquires the lock, they’ll either return if their stacks are nonempty, or they’ll enter the wait if their stacks are empty.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

An Introduction to Parallel Programming : Parallel Program Development : Parallelizing the tree-search programs using OpenMP |