Parallelizing the tree-search programs using OpenMP
The issues involved in implementing the static
and dynamic parallel tree-search programs using OpenMP are the same as the
issues involved in implementing the programs using Pthreads.
There are almost no substantive differences
between a static implementation that uses OpenMP and one that uses Pthreads.
However, a couple of points should be mentioned:
1. When a single thread executes some code in the Pthreads version,
the test
if (my_rank == whatever)
can be replaced by the OpenMP directive
# pragma omp single
This will insure that the following structured
block of code will be executed by one thread in the team, and the other threads
in the team will wait in an implicit barrier at the end of the block until the
executing thread is finished.
When whatever is 0 (as it is in each test in the Pthreads
program), the test can also be replaced by the OpenMP directive
# pragma omp master
This will insure that thread 0 executes the
following structured block of code. However, the master directive doesn’t put an implicit barrier at
the end of the block, so it may be necessary to also add a barrier directive after a structured block that has been modified by a master directive.
2. The
Pthreads mutex that protects the best tour can be replaced by a single critical directive placed either inside the
Update best tour function
or imme-diately before the call to Update
best tour. This is the only potential
source of a race condition after the distribution of the initial tours, so the
simple critical directive won’t cause a thread to block unnecessarily.
The dynamically load-balanced Pthreads
implementation depends heavily on Pthreads condition variables, and OpenMP
doesn’t provide a comparable object. The rest of the Pthreads code can be
easily converted to OpenMP. In fact, OpenMP even provides a nonblocking version
of omp_set_lock. Recall that OpenMP provides a lock object omp_lock_t and the following functions for acquiring and relinquishing the
lock, respectively:
void omp_set_lock(omp_lock_t* lock p /* in/out */);
void omp_unset_lock(omp_lock_t* lock p /* in/out */);
It also provides the function
int omp_test_lock(omp_lock_t* lock p /* in/out */);
which is analogous to pthread mutex trylock; it
attempts to acquire the lock lock p, and if it succeeds it returns true (or
nonzero). If the lock is being used by some other thread, it returns
immediately with return value false (or zero).
If we examine the pseudocode for the Pthreads
Terminated function in Program 6.8, we see that in order to adapt the Pthreads
version to OpenMP, we need to emulate the functionality of the Pthreads
function calls
Pthread_cond_signal(&term_cond_var);
pthread_cond_broadcast(&term_cond_var);
pthread_cond_wait(&term_cond_var, &term_mutex);
in Lines 6, 17, and 22, respectively.
Recall that a thread that has entered the
condition wait by calling
pthread_cond_wait(&term_cond_var, &term_mutex);
is waiting for either of two events:
.. Another thread has split its stack and
created work for the waiting thread. All of the threads have run out of work.
Perhaps the simplest solution to emulating a
condition wait in OpenMP is to use busy-waiting. Since there are two conditions
a waiting thread should test for, we can use two different variables in the
busy-wait loop:
/* Global
variables */
int awakened thread = 1;
int work remains = 1; /* true */
. . .
while (awakened thread != my rank &&
work remains);
Initialization of the two variables is crucial:
If awakened thread has the value of some thread’s rank, that thread will exit
immediately from the while, but there may be no work available. Similarly, if
work remains is initialized to 0, all the threads will exit the while loop immediately
and quit.
Now recall that when a thread enters a Pthreads
condition wait, it relinquishes the mutex associated with the condition
variable so that another thread can also enter the condition wait or signal the
waiting thread. Thus, we should relinquish the lock used in the Terminated
function before starting the while loop.
Also recall that when a thread returns from a
Pthreads condition wait, it reacquires the mutex associated with the condition
variable. This is especially important in this setting since if the awakened
thread has received work, it will need to access the shared data structures
storing the new stack. Thus, our complete emulated condition wait should look
something like this:
/* Global
vars */
int awakened_thread = 1;
work_remains = 1; /* true */
omp_unset_lock(&term_lock);
while (awakened_thread != my_rank &&
work_remains);
omp_set_lock(&term_lock);
If you recall the discussion of busy-waiting in
Section 4.5 and Exercise 4.3 of Chapter 4, you may be concerned about the possibility
that the compiler might reorder the code around the busy-wait loop. The
compiler should not reorder across calls to omp set lock or omp unset lock.
However, the updates to the variables could be reordered, so if we’re going to
be using compiler optimization, we should declare both with the volatile
keyword.
Emulating the condition broadcast is
straightforward: When a thread determines that there’s no work left (Line 14 in
Program 6.8), then the condition broadcast (Line 17) can be replaced with the
assignment
work_remains = 0; /* Assign false
to work remains */
The “awakened” threads can check if they were
awakened by some thread’s setting work remains to false, and, if they were,
return from Terminated with the value true.
Emulating the condition signal requires a
little more work. The thread that has split its stack needs to choose one of
the sleeping threads and set the variable awakened thread to the chosen
thread’s rank. Thus, at a minimum, we need to keep a list of the ranks of the
sleeping threads. A simple way to do this is to use a shared queue of thread
ranks. When a thread runs out of work, it enqueues its rank before entering the
busy-wait loop. When a thread splits its stack, it can choose the thread to
awaken by dequeuing the queue of waiting threads:
got_lock = omp test lock(&term lock); if
(got lock != 0) {
if (waiting threads > 0 && new stack
== NULL) {
Split my stack creating new stack;
awakened thread = Dequeue(term queue);
}
omp unset lock(&term lock);
}
The awakened thread needs to reset awakened
thread to - 1 before it returns from its call to the Terminated function.
Note that there is no danger that some other
thread will be awakened before the awakened thread reacquires the lock. As long
as new stack is not NULL, no thread will attempt to split its stack, and hence
no thread will try to awaken another thread. So if several threads call
Terminated before the awakened thread reacquires the lock, they’ll either
return if their stacks are nonempty, or they’ll enter the wait if their stacks
are empty.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.