Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Hand-Coded Synchronization and Sharing

Dekker’s Algorithm

One of the first lockless algorithms was Dekker’s algorithm for mutual exclusion.

Dekker’s Algorithm

One of the first lockless algorithms was Dekker’s algorithm for mutual exclusion. Without using any atomic operations, the algorithm ensures that only one thread at a time out of a pair of threads can enter a critical region. Listing 8.16 shows an implemen-tation of Dekker’s algorithm. To increment the counter a thread would call the function increment() with its thread ID.

Listing 8.16 Implementation of Dekker’s Algorithm for Mutual Exclusion

volatile int priority = 0; volatile int counter = 0; volatile int waiting[2];

void increment( int id )

{

waiting[id] = 1;

while( waiting[1-id] == 1 )

{

if ( priority != id )

{

waiting[id] = 0;

while ( priority != id ){} waiting[id] = 1;

}

/* Critical section */ counter++;

/* Exit critical section */ priority = 1-id; waiting[id] = 0;

}

The algorithm works because each thread signals that it is waiting to get into the critical section. If the other thread has not signaled that it is waiting for or has already entered the critical section, then the current thread can enter it. If both threads are wait-ing, then one of the threads gets priority, and the other thread waits until it gets priority.

The variables priority and counter, together with the array waiting, are shared between the two threads and as such need to be declared as being volatile. This ensures that the compiler stores these variables immediately after modification and loads them immediately before use. To test the correctness of this implementation, we can place the code in a harness. Listing 8.17 shows the harness. This harness creates the two threads. Each thread increments the variable counter 1 million times, so at the end of the program, the variable should contain the value 2 million. The program reports the differ-ence between this value and what the counter actually contains.

Listing 8.17 Test Harness for Dekker’s Algorithm

#include <stdio.h> #include <pthread.h>

void * worker( void * param )

{

for ( int i=0; i<1000000; i++ )

{

increment( (int)param );

}

int main()

{

pthread_t threads[2];

pthread_create( &threads[0], 0, worker, (void*)0 ); pthread_create( &threads[1], 0, worker, (void*)1 ); pthread_join( threads[1], 0 );

pthread_join( threads[0], 0 );

printf( "Errors = %i\n", 2*1000000 - counter );

}

Listing 8.18 shows the results of compiling and running the program. Unfortunately, the program reports that the variable did not get incremented the correct number of times, and this means that there were some situations when the two threads managed to enter the critical region at the same time.

Listing 8.18 Results of Compiling and Running Test Harness

$ cc -O -mt dekker.c

$ ./a.out

Errors = 14

The problem with the code is one of memory ordering. The two threads can simulta-neously indicate that they are waiting to enter the critical section by storing to their index in the waiting array. In the very next cycle, they load the other thread’s waiting status. Since the other thread has only just issued the store, the store has not yet made its way through the pipeline to be visible to the rest of the system. So, the load instruction picks up a zero, indicating that the other thread is not waiting for the critical section. Both threads fail to see the other thread waiting, and both threads enter the critical region.

The way to fix this is to put memory barriers into the code. The memory barriers ensure that previous memory operations have completed before the next memory oper-ation issues. In this case, we want to ensure that the store to indicate a thread is waiting is completed before the load to check whether the other thread is also waiting. Now consider the sequence of operations. Both threads hit the store at the same time; both threads now wait until the stores are visible to the other processor before issuing their load. So, both threads will get the data that was stored by the other thread.

On SPARC processors, the appropriate memory barrier to use is membar #StoreLoad, which ensures that all previous stores have completed before the following load is issued. On x86 processors, it is necessary to use an mfence operation that ensures that all previ-ous memory operations complete before the next memory operation is issued.

Listing 8.19 shows the modified code for Dekker’s algorithm. The code requires two memory barriers, once before the loop is entered and a second barrier inside the loop. With this modification, the code produces the correct result when run in the test harness.

Listing 8.19 Dekker’s Algorithm with Memory Barriers

#include <stdio.h> #include <pthread.h>

volatile int priority = 0; volatile int counter = 0; volatile int waiting[2];

void increment( int i )

{

waiting[i] = 1;

#ifdef __sparc

asm( "membar #StoreLoad": : : "memory" );

#else

asm( "mfence": : : "memory" );

#endif

while( waiting[1-i] == 1 )

{

if ( priority != i )

{

waiting[i] = 0;

while ( priority != i ){} waiting[i] = 1;

#ifdef __sparc

asm( "membar #StoreLoad": : : "memory" );

#else

asm( "mfence": : :"memory" );

#endif

}

counter++;

priority = 1-i;

waiting[i] = 0;

}

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Multicore Application Programming For Windows, Linux, and Oracle Solaris : Hand-Coded Synchronization and Sharing : Dekker’s Algorithm |