Using
Compare and Swap Instructions to Form More Complex Atomic Operations
Not all processors implement atomic add or increment instructions.
However, most processors do implement a variant of the compare and swap
instruction. This instruction can form the basis of most atomic operations.
An atomic compare and swap (CAS) operation compares
the value held at a memory location with the value held in a register. If the
value at that location matches the desired value passed to the instruction,
then the value held in memory will be atomically updated with a second value
passed into the instruction. The return value from CAS is the original value
that was held in memory. If the operation was successful, the return value will
be the same as the desired value. If the operation did not succeed, the return
value will be some other value. This enables the code to determine whether the
opera-tion succeeded.
This operation can be useful in a number of situations. For example,
assume a mutex lock is implemented using a variable that holds one when the
lock is held or zero when the lock is available. The CAS operation can be used
to transition the lock from free to taken. To acquire the lock, the variable lock must hold the value zero, and the instruc-tion needs to atomically
replace the zero with a one. Listing 8.2 shows the code for a simple spinlock. The volatile keyword ensures that the compiler repeats the CAS operation on every
iteration.
Listing 8.2 A Simple Spinlock Implemented Using CAS
#ifdef __sparc
int CAS( volatile int* addr, int ov, int nv )
{
asm volatile( "cas %1, %2, %0":
"=r"(nv):
"m"(*addr), "r"(ov), "0"(nv):
"memory" );
return nv;
}
#else
int CAS( volatile int* addr, int ov, int nv )
{
asm volatile( "lock; cmpxchg %2, %1":
"=a"(ov):
"m"(*addr), "r"(nv), "a"(ov):
"memory" );
return ov;
}
#endif
void lock_spinlock( volatile int * lock )
{
while ( CAS(lock, 0, 1) != 0 ) {} // Spin until lock acquired
}
void free_spinlock( volatile int * lock )
{
*lock = 0;
}
It is tempting to imagine that a
spinlock could be implemented without using atomic operations, as shown in
Listing 8.3. The problem with the code is that it does not guar-antee that only
a single thread will acquire the lock. If a thread observes that the lock is
free, it will then assume that it has acquired it. The atomic operation used in
Listing 8.2 ensures that the only way the thread can exit is if it actually has
acquired the lock.
Listing 8.3 An
Incorrect Implementation of a Spinlock
void lock_spinlock( volatile int * lock )
{
while ( *lock == 1 ) {} // Spin while lock busy
*lock = 1 ;
}
void
free_spinlock( volatile int * lock )
{
*lock = 0;
}
The CAS instruction can also be used in situations
where atomic modification of an unusual variable type is required. Consider the
situation where atomic increment of a floating-point value is required. Listing
8.4 shows code to perform this.
Listing 8.4 Code to Atomically Increment
a Floating-Point Value
void
atomic_add_float( volatile float * variable, float increment )
{
union
{
int asint; float asfp;
} oldvalue, newvalue;
do
{
oldvalue.asfp |
= |
*variable; |
// |
Line |
11 |
newvalue.asfp |
= |
oldvalue.asfp + increment; |
// |
Line |
12 |
}
while ( CAS( variable, oldvalue.asint,
newvalue.asint ) != oldvalue.asint );
}
The code to perform the atomic update of a
floating-point variable appears rather complex. A fair amount of the complexity
is because the CAS operation takes integer parameters; hence, the
floating-point values need to be converted into integers. This is the function
of the union in the
code.
The code reads the value of the variable to be modified and then
prepares the modi-fied version of this variable. In the code, the variable is
read only once, at line 11, and then held in the local variable oldvalue. This is to avoid a data race where the value changes between the first
read of the variable and its use as one operand in the addition, at line 12.
The race is subtle. Suppose that the first time the variable is read, at line
11, it has the value 20; this is recorded as the original value of the
variable. At this point, another thread changes the value of the variable, so
the second time it is read, at line 12, it has the value 21. The value 21 is
incremented to 22, and this will become the new value of the variable if the
operation is successful. Suppose now that the other thread has returned the
value of the variable to 20 by the time the CAS operation is tried. Since the
variable contains 20 and this is the expected value, the CAS operation will
store the value 22 into the variable, rather than the correct value of 21.
We can explore this problem of reloading values with another example.
Listing 8.5 shows code that adds an element to the top of a list. The code
creates a new element and stores the appropriate value in this. The loop keeps
attempting to add this new element to the front of the list until it succeeds.
Each iteration of the loop sets the next field of the new element to be the
next element in the list and then attempts to atomically com-pare and swap the
head of the list with a pointer to the new element. If the compare and swap
succeeded, the return value will be the pointer to the element that used to be
the top of the list. If this happens, the code has succeeded, and the loop can
exit.
Listing 8.5 Code to Add an Element to t he Top of a List
void addelement( element_t ** head, int value )
{
element_t * element =
(element_t*)malloc(sizeof(element_t)); element->value = value;
while (element!=0)
{
element->next = *head;
if ( CAS( head, element->next, element ) == element->next)
{
element = 0;
}
}
The problem with this code is that the CAS() function call causes the compiler to reload the value of element->next. There is a short window of opportunity for another thread to take the
top element from the list between the CAS() function call and the load of element->next. If this other thread takes the element off the list and modifies the
value of element->next, then
the compare and swap will not match with the new value of element->next, and the loop will assume that it failed to add the element to the
queue, even though it actually succeeded. So, the code will attempt to add the
ele-ment a second time, causing an error. The solution is to hold the original
value of *head in a local
variable and use this in the comparison to determine whether the compare and
swap was successful.
The opportunity for this data corruption to occur
is only a few cycles in duration. However, for a sufficiently long-running
program, this will eventually occur. In common with all data-race type errors,
the result of this data corruption will be detected arbitrar-ily far from when
the corruption occurred.
Returning to the atomic add operation, notice that the operation is
actually a loop. Although the observable effect is that the value in the
variable is incremented atomically, this operation can take an arbitrary number
of iterations around the loop to complete. This is actually a lock-free algorithm for updating the
variable. This is really to contrast with the obvious implementation using
mutex locks shown in Listing 8.6.
Listing 8.6 Addition
of Values Using Mutex Locks
void
atomic_add_float( pthread_mutex_t * lock, float * var, float inc )
{
pthread_mutex_lock( lock ); *var += inc;
pthread_mutex_unlock( lock );
}
As mutex locks can be implemented using CAS
operations, the resulting code for the two situations is structurally not too
dissimilar—both codes have a single loop. The dif-ference is that the mutex
lock loops on the CAS of the lock,
whereas the lock-free vari-ant loops on the CAS of the variable.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.