Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Synchronization and Data Sharing

Atomic Operations and Lock-Free Code

Using synchronization primitives can add a high overhead cost. This is particularly true if they are implemented as calls into the operating system rather than calls into a supporting library.

Atomic Operations and Lock-Free Code

Using synchronization primitives can add a high overhead cost. This is particularly true if they are implemented as calls into the operating system rather than calls into a supporting library. These overheads lower the performance of the parallel application and can limit scalability. In some cases, either atomic operations or lock-free code can produce func-tionally equivalent code without introducing the same amount of overhead.

An atomic operation is one that will either successfully complete or fail; it is not possi-ble for the operation to either result in a “bad” value or allow other threads on the sys-tem to observe a transient value. An example of this would be an atomic increment, which would mean that the calling thread would replace a variable that currently holds the value N with the value N+1. This might sound trivial, but bear in mind that the operation of incrementing a variable can involve multiple steps, as shown in Listing 4.12.

Listing 4.12 Steps Involved in Incrementing a Variable

LOAD [%o0], %o1 // Load initial value

ADD %o1, 1, %o1 // Increment value

STORE %o1, [%o0] // Store new value back to memory

During the execution of the three steps shown in Listing 4.12, another thread could have interfered and replaced the value of the variable held in memory with a new value, creating a data race.

An atomic increment operation would not allow another thread to modify the same variable and cause an erroneous value to be written to memory. Performing the incre-ment operation atomically is logically equivalent to implicitly acquiring a mutex before the increment and releasing it afterward. The difference is that using a mutex relies on other threads using the same mutex to protect that variable. A thread that did not use the same mutex could cause an incorrect value to be written back to memory. With atomic operations, another thread cannot cause an incorrect value to be written to memory.

Most operating systems or compilers provide support for a range of atomic opera-tions. However, hardware typically provides support only for a more limited range of operations; hence, the more complex atomic operations are usually made from the sim-ple atomic instructions that the hardware provides.

Atomic operations are often used to enable the writing of lock-free code. Using a syn-chronization device such as a mutex lock solves the correctness problem; only one thread can access the protected memory location at a single time. However, it is not necessarily the approach with the best scaling, since threads that are waiting for access are blocked (sent to sleep). A lock-free implementation would not rely on a mutex lock to protect access; instead, it would use a sequence of operations that would perform the operation without having to acquire an explicit lock. This can be higher performance than control-ling access with a lock. Lock-free does not imply that the other threads would not have to wait; it only indicates that they do not have to wait on a lock. A wait-free implementa-tion would allow all the threads to simultaneously make forward progress.

Most of the more complex atomic operations are actually lock-free implementations. They use a low-level, hardware-provided atomic operation and wrap code around that to ensure that the required higher-level operation is actually atomic.

An example of a low-level atomic operation would be compare and swap (CAS), which atomically swaps the value held in a register with the value held in memory if and only if the value held in memory matches the expected value. As an example of using this hardware-provided atomic operation to produce a higher-level atomic operation, the CAS instruction could be used as the basis for an atomic increment operation. To do this, the CAS instruction would be executed in a loop. Each iteration, it would attempt to replace the value held in memory with the incremented value. The loop would exit when the CAS instruction successfully performed the increment operation.

Lock-free code can be used to achieve more complex operations; this will be dis-cussed in Chapter 8, “Hand-Coding Synchronization and Sharing.”

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Multicore Application Programming For Windows, Linux, and Oracle Solaris : Synchronization and Data Sharing : Atomic Operations and Lock-Free Code |