Chapter: An Introduction to Parallel Programming : Shared-Memory Programming with Pthreads

Barriers and Condition Variables

Let’s take a look at another problem in shared-memory programming: synchronizing the threads by making sure that they all are at the same point in a program.

BARRIERS AND CONDITION VARIABLES

Let’s take a look at another problem in shared-memory programming: synchronizing the threads by making sure that they all are at the same point in a program. Such a point of synchronization is called a barrier because no thread can proceed beyond the barrier until all the threads have reached it.

Barriers have numerous applications. As we discussed in Chapter 2, if we’re tim-ing some part of a multithreaded program, we’d like for all the threads to start the timed code at the same instant, and then report the time taken by the last thread to finish, that is, the “slowest” thread. We’d therefore like to do something like this:

/∗ Shared ∗/

double elapsed_time;

. . .

/∗ Private ∗/

double my_start, my_finish, my_elapsed;

. . .

Synchronize threads;

Store current time in my_start;

/∗ Execute timed code ∗/

. . .

Store current time in my_finish;

my_elapsed = Maximum of my_elapsed values;

Using this approach, we’re sure that all of the threads will record my start at approximately the same time.

Another very important use of barriers is in debugging. As you’ve probably already seen, it can be very difficult to determine where an error is occuring in a parallel program. We can, of course, have each thread print a message indicating which point it’s reached in the program, but it doesn’t take long for the volume of the output to become overwhelming. Barriers provide an alternative:

point in program we want to reach; barrier;

if (my_rank == 0) {

printf("All threads reached this pointnn"); fflush(stdout);

}

Many implementations of Pthreads don’t provide barriers, so if our code is to be portable, we need to develop our own implementation. There are a number of options; we’ll look at three. The first two only use constructs that we’ve already studied. The third uses a new type of Pthreads object: a condition variable.

1. Busy-waiting and a mutex

Implementing a barrier using busy-waiting and a mutex is straightforward: we use a shared counter protected by the mutex. When the counter indicates that every thread has entered the critical section, threads can leave a busy-wait loop.

/* Shared and initialized by the main thread */

int counter; /* Initialize to 0* /

int thread_count;

pthread_mutex_t barrier_mutex;

. . .

void Thread_work(. . .) {

. . .

/* Barrier */

pthread_mutex_lock(&barrier_mutex); counter++;

pthread_mutex_unlock(&barrier_mutex); while (counter < thread_count);

. . .

}

Of course, this implementation will have the same problems that our other busy-wait codes had: we’ll waste CPU cycles when threads are in the busy-wait loop, and, if we run the program with more threads than cores, we may find that the performance of the program seriously degrades.

Another issue is the shared variable counter. What happens if we want to implement a second barrier and we try to reuse the counter? When the first barrier is completed, counter will have the value thread_count. Unless we can somehow reset counter, the while condition we used for our first barrier counter < thread_count will be false, and the barrier won’t cause the threads to block. Furthermore, any attempt to reset counter to zero is almost certainly doomed to failure. If the last thread to enter the loop tries to reset it, some thread in the busy-wait may never see the fact that counter == thread_count, and that thread may hang in the busy-wait. If some thread tries to reset the counter after the barrier, some other thread may enter the second barrier before the counter is reset and its increment to the counter will be lost. This will have the unfortunate effect of causing all the threads to hang in the second busy-wait loop. So if we want to use this barrier, we need one counter variable for each instance of the barrier.

2. Semaphores

A natural question is whether we can implement a barrier with semaphores, and, if so, whether we can reduce the number of problems we encountered with busy-waiting. The answer to the first question is yes:

/* Shared variables */

int counter; /* Initialize to 0 */

sem_t count_sem; /* Initialize to 1 */

sem_t barrier_sem; /* Initialize to 0 */

. . .

void Thread_work(...) {

. . .

/* Barrier */

sem_wait(&count_sem);

if (counter == thread_count - 1) { counter = 0;

sem_post(&count_sem);

for (j = 0; j < thread_count -1; j++) sem_post(&barrier_sem);

} else { counter++;

sem_post(&count_sem); sem_wait(&barrier_sem);

}

. . .

}

As with the busy-wait barrier, we have a counter that we use to determine how many threads have entered the barrier. We use two semaphores: count_sem pro-tects the counter, and barrier_sem is used to block threads that have entered the barrier. The count_sem semaphore is initialized to 1 (that is, “unlocked”), so the first thread to reach the barrier will be able to proceed past the call to sem_wait. Subsequent threads, however, will block until they can have exclusive access to the counter. When a thread has exclusive access to the counter, it checks to see if counter < thread_count-1. If it is, the thread increments counter “relinquishes the lock” (sem_post(&count sem)) and blocks in sem wait(&barrier_sem). On the other hand, if counter == thread_count-1, the thread is the last to enter the barrier, so it can reset counter to zero and “unlock” count sem by calling sem post(&count_sem). Now, it wants to notify all the other threads that they can proceed, so it executes sem post(&barrier sem) for each of the thread count-1 threads that are blocked in sem wait(&barrier_sem).

Note that it doesn’t matter if the thread executing the loop of calls to sem_post(& barrier_sem) races ahead and executes multiple calls to sem post before a thread can be unblocked from sem_wait(&barrier_sem). For recall that a semaphore is an unsigned int, and the calls to sem post increment it, while the calls to sem wait decrement it—unless it’s already 0, in which case the calling threads will block until it’s positive again, and they’ll decrement it when they unblock. Therefore, it doesn’t matter if the thread executing the loop of calls to sem post(&barrier_sem) gets ahead of the threads blocked in the calls to sem_wait(&barrier_sem), because eventually the blocked threads will see that barrier sem is positive, and they’ll decrement it and proceed.

It should be clear that this implementation of a barrier is superior to the busy-wait barrier, since the threads don’t need to consume CPU cycles when they’re blocked in sem_wait. Can we reuse the data structures from the first barrier if we want to execute a second barrier?

The counter can be reused, since we were careful to reset it before releas-ing any of the threads from the barrier. Also, count_sem can be reused, since it is reset to 1 before any threads can leave the barrier. This leaves barrier_sem. Since there’s exactly one sem post for each sem wait, it might appear that the value of barrier_sem will be 0 when the threads start executing a second barrier. However, suppose we have two threads, and thread 0 is blocked in sem_wait(&barrier_sem) in the first barrier, while thread 1 is executing the loop of sem post. Also suppose that the operating system has seen that thread 0 is idle, and descheduled it out. Then thread 1 can go on to the second barrier. Since counter == 0, it will execute the else clause. After incrementing counter, it executes sem_post(&count_sem), and then executes sem wait(&barrier_sem).

However, if thread 0 is still descheduled, it will not have decremented barrier sem. Thus when thread 1 reaches sem_wait(&barrier_sem), barrier_sem will still be 1, so it will simply decrement barrier_sem and proceed. This will have the unfortunate consequence that when thread 0 starts executing again, it will still be blocked in the first sem_wait(&barrier_sem), and thread 1 will proceed through the second barrier before thread 0 has entered it. Reusing barrier_sem therefore results in a race condition.

3. Condition variables

A somewhat better approach to creating a barrier in Pthreads is provided by condition variables. A condition variable is a data object that allows a thread to suspend exe-cution until a certain event or condition occurs. When the event or condition occurs another thread can signal the thread to “wake up.” A condition variable is always associated with a mutex.

Typically, condition variables are used in constructs similar to this pseudocode:

lock mutex;

if condition has occurred signal thread(s);

else {

unlock the mutex and block;

/* when thread is unblocked, mutex is relocked*/

}

unlock mutex;

Condition variables in Pthreads have type pthread cond t. The function

int pthread_cond_signal(pthread_cond_t* cond_var_p /* in/out */);

will unblock one of the blocked threads, and

int pthread_cond_broadcast(pthread_cond_t* cond_var_p /* in/out */);

will unblock all of the blocked threads. The function

int pthread_cond_wait(

pthread_cond_t* cond_var_p /* in/out */,

pthread_mutex_t* mutex_p /* in/out */);

will unlock the mutex referred to by mutex p and cause the executing thread

to block until it is unblocked by another thread’s call to pthread cond signal

or pthread cond broadcast. When the thread is unblocked, it reacquires the mutex. So in effect, pthread cond wait implements the following sequence of functions:

pthread_mutex_unlock(&mutex_p);

wait_on_signal(&cond_var_p);

pthread_mutex_lock(&mutex_p);

The following code implements a barrier with a condition variable:

/* Shared */

int counter = 0;

pthread_mutex_t mutex;

pthread_cond_t cond_var;

. . .

Void* Thread work(. . .) {

. . .

/* Barrier */

pthread_mutex_lock(&mutex);

counter++;

if (counter == thread_count) { counter = 0;

pthread_cond_broadcast(&cond_var);

} else {

while (pthread_cond_wait(&cond_var, &mutex) != 0);

}

pthread_mutex_unlock(&mutex);

. . .

}

Note that it is possible that events other than the call to pthread_cond_broadcast can cause a suspended thread to unblock (see, for example, Butenhof [6], page 80). Hence, the call to pthread cond wait is usually placed in a while loop. If the thread is unblocked by some event other than a call to pthread cond signal or pthread_cond_broadcast, then the return value of pthread cond wait will be nonzero, and the unblocked thread will call pthread_cond wait again.

If a single thread is being awakened, it’s also a good idea to check that the condition has, in fact, been satisfied before proceeding. In our example, if a single thread were being released from the barrier with a call to pthread_cond_signal, then that thread should verify that counter == 0 before proceeding. This can be dangerous with the broadcast, though. After being awakened, some thread may race ahead and change the condition, and if each thread is checking the condition, a thread that awakened later may find the condition is no longer satisfied and go back to sleep.

Note that in order for our barrier to function correctly, it’s essential that the call to pthread_cond_wait unlock the mutex. If it didn’t unlock the mutex, then only one thread could enter the barrier; all of the other threads would block in the call to pthread_mutex_lock, the first thread to enter the barrier would block in the call to pthread_cond_wait, and our program would hang.

Also note that the semantics of mutexes requires that the mutex be relocked before we return from the call to pthread_cond_wait. We “obtained” the lock when we returned from the call to pthread_mutex_lock. Hence, we should at some point “relinquish” the lock through a call to pthread_mutex_unlock.

Like mutexes and semaphores, condition variables should be initialized and destroyed. In this case, the functions are

int pthread_cond_init(

pthread_cond_t* cond_p /* out */,

const pthread_condattr t* cond_attr_p /* in */);

int pthread_cond_destroy(pthread_cond_t* cond_p /* in/out */);

We won’t be using the second argument to pthread_cond_init (we’ll call it with second argument NULL).

4. Pthreads barriers

Before proceeding we should note that the Open Group, the standards group that is continuing to develop the POSIX standard, does define a barrier interface for Pthreads. However, as we noted earlier, it is not universally available, so we haven’t discussed it in the text. See Exercise 4.9 for some of the details of the API.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

An Introduction to Parallel Programming : Shared-Memory Programming with Pthreads : Barriers and Condition Variables |