Home | | Multi - Core Architectures and Programming | Using Compare and Swap Instructions to Form More Complex Atomic Operations

Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Hand-Coded Synchronization and Sharing

Using Compare and Swap Instructions to Form More Complex Atomic Operations

Not all processors implement atomic add or increment instructions. However, most processors do implement a variant of the compare and swap instruction. This instruction can form the basis of most atomic operations.

Using Compare and Swap Instructions to Form More Complex Atomic Operations

 

Not all processors implement atomic add or increment instructions. However, most processors do implement a variant of the compare and swap instruction. This instruction can form the basis of most atomic operations.

 

An atomic compare and swap (CAS) operation compares the value held at a memory location with the value held in a register. If the value at that location matches the desired value passed to the instruction, then the value held in memory will be atomically updated with a second value passed into the instruction. The return value from CAS is the original value that was held in memory. If the operation was successful, the return value will be the same as the desired value. If the operation did not succeed, the return value will be some other value. This enables the code to determine whether the opera-tion succeeded.

 

This operation can be useful in a number of situations. For example, assume a mutex lock is implemented using a variable that holds one when the lock is held or zero when the lock is available. The CAS operation can be used to transition the lock from free to taken. To acquire the lock, the variable lock must hold the value zero, and the instruc-tion needs to atomically replace the zero with a one. Listing 8.2 shows the code for a simple spinlock. The volatile keyword ensures that the compiler repeats the CAS operation on every iteration.

 

Listing 8.2  A Simple Spinlock Implemented Using CAS

#ifdef __sparc

 

int CAS( volatile int* addr, int ov, int nv )

 

{

 

asm volatile( "cas %1, %2, %0": "=r"(nv):

 

"m"(*addr), "r"(ov), "0"(nv): "memory" );

 

return nv;

 

}

 

#else

 

int CAS( volatile int* addr, int ov, int nv )

 

{

 

asm volatile( "lock; cmpxchg %2, %1": "=a"(ov):

 

"m"(*addr), "r"(nv), "a"(ov): "memory" );

 

return ov;

 

}

 

#endif

 

void lock_spinlock( volatile int * lock )

 

{

 

while ( CAS(lock, 0, 1) != 0 ) {} // Spin until lock acquired

 

}

 

void free_spinlock( volatile int * lock )

 

{

 

*lock = 0;

 

}

It is tempting to imagine that a spinlock could be implemented without using atomic operations, as shown in Listing 8.3. The problem with the code is that it does not guar-antee that only a single thread will acquire the lock. If a thread observes that the lock is free, it will then assume that it has acquired it. The atomic operation used in Listing 8.2 ensures that the only way the thread can exit is if it actually has acquired the lock.

 

Listing 8.3        An Incorrect Implementation of a Spinlock

void lock_spinlock( volatile int * lock )

 

{

 

while ( *lock == 1 ) {}          // Spin while lock busy

*lock = 1 ;

 

}

 

void free_spinlock( volatile int * lock )

 

{

 

*lock = 0;

 

}

 

 

The CAS instruction can also be used in situations where atomic modification of an unusual variable type is required. Consider the situation where atomic increment of a floating-point value is required. Listing 8.4 shows code to perform this.

 

Listing 8.4 Code to Atomically Increment a Floating-Point Value

 

void atomic_add_float( volatile float * variable, float increment )

 

{

 

union

 

{

 

int asint; float asfp;

} oldvalue, newvalue;

 

do

 

{

 

oldvalue.asfp

=

*variable;

//

Line

11

newvalue.asfp

=

oldvalue.asfp + increment;

//

Line

12

 

}

 

while ( CAS( variable, oldvalue.asint, newvalue.asint ) != oldvalue.asint );

 

}

The code to perform the atomic update of a floating-point variable appears rather complex. A fair amount of the complexity is because the CAS operation takes integer parameters; hence, the floating-point values need to be converted into integers. This is the function of the union in the code.

 

The code reads the value of the variable to be modified and then prepares the modi-fied version of this variable. In the code, the variable is read only once, at line 11, and then held in the local variable oldvalue. This is to avoid a data race where the value changes between the first read of the variable and its use as one operand in the addition, at line 12. The race is subtle. Suppose that the first time the variable is read, at line 11, it has the value 20; this is recorded as the original value of the variable. At this point, another thread changes the value of the variable, so the second time it is read, at line 12, it has the value 21. The value 21 is incremented to 22, and this will become the new value of the variable if the operation is successful. Suppose now that the other thread has returned the value of the variable to 20 by the time the CAS operation is tried. Since the variable contains 20 and this is the expected value, the CAS operation will store the value 22 into the variable, rather than the correct value of 21.

 

We can explore this problem of reloading values with another example. Listing 8.5 shows code that adds an element to the top of a list. The code creates a new element and stores the appropriate value in this. The loop keeps attempting to add this new element to the front of the list until it succeeds. Each iteration of the loop sets the next field of the new element to be the next element in the list and then attempts to atomically com-pare and swap the head of the list with a pointer to the new element. If the compare and swap succeeded, the return value will be the pointer to the element that used to be the top of the list. If this happens, the code has succeeded, and the loop can exit.

 

Listing 8.5  Code to Add an Element to t he Top of a List

void addelement( element_t ** head, int value )

 

{

 

element_t * element = (element_t*)malloc(sizeof(element_t)); element->value = value;

while (element!=0)

 

{

 

element->next = *head;

 

if ( CAS( head, element->next, element ) == element->next)

 

{

 

element = 0;

 

}

 

}

The problem with this code is that the CAS() function call causes the compiler to reload the value of element->next. There is a short window of opportunity for another thread to take the top element from the list between the CAS() function call and the load of element->next. If this other thread takes the element off the list and modifies the value of element->next, then the compare and swap will not match with the new value of element->next, and the loop will assume that it failed to add the element to the queue, even though it actually succeeded. So, the code will attempt to add the ele-ment a second time, causing an error. The solution is to hold the original value of *head in a local variable and use this in the comparison to determine whether the compare and swap was successful.

 

The opportunity for this data corruption to occur is only a few cycles in duration. However, for a sufficiently long-running program, this will eventually occur. In common with all data-race type errors, the result of this data corruption will be detected arbitrar-ily far from when the corruption occurred.

 

Returning to the atomic add operation, notice that the operation is actually a loop. Although the observable effect is that the value in the variable is incremented atomically, this operation can take an arbitrary number of iterations around the loop to complete. This is actually a lock-free algorithm for updating the variable. This is really to contrast with the obvious implementation using mutex locks shown in Listing 8.6.

 

Listing 8.6     Addition of Values Using Mutex Locks

 

void atomic_add_float( pthread_mutex_t * lock, float * var, float inc )

 

{

 

pthread_mutex_lock( lock ); *var += inc; pthread_mutex_unlock( lock );

}

As mutex locks can be implemented using CAS operations, the resulting code for the two situations is structurally not too dissimilar—both codes have a single loop. The dif-ference is that the mutex lock loops on the CAS of the lock, whereas the lock-free vari-ant loops on the CAS of the variable.


Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Multicore Application Programming For Windows, Linux, and Oracle Solaris : Hand-Coded Synchronization and Sharing : Using Compare and Swap Instructions to Form More Complex Atomic Operations |


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.