Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Hand-Coded Synchronization and Sharing

Enforcing Memory Ordering to Ensure Correct Operation

When writing any multithreaded code, it is important to consider the memory-ordering characteristics of the system.

Enforcing Memory Ordering to Ensure Correct Operation

In the previous section, we looked at the implementation of a simple spinlock. Unfortunately, there is the potential for more implementation complexity. When writing any multithreaded code, it is important to consider the memory-ordering characteristics of the system. When a thread running on a multithread system performs a memory opera-tion, that operation may or may not become visible to the rest of the system in the order in which it occurred.

For example, an application might perform two store operations. On some processors, the second store operation could become visible to other processors before the first store operation becomes visible. This is called weak memory ordering.

Consider this in the context of a mutex lock. The application updates a variable and stores it back to memory. It then frees the lock by writing a zero into the lock. If the system implements a weak memory ordering, the store to free the lock might become visible to other processors in the system before the store to update the data. Another processor that was waiting on the lock might see that the lock is free and read the value of the variable protected by the lock before the update of that variable is visible.

To stop this kind of error, it may be necessary to include instructions that enforce the desired memory ordering. Determining whether a particular processor requires these instructions typically requires a careful read of the documentation. However, code that assumes that these instructions are necessary will run correctly on hardware that does not need the instructions, whereas the converse is not true.

First, we’ll consider the release of a mutex lock in more detail. The release operation is performed by storing a zero to the lock variable. The code executed while the mutex lock is held will update memory. It must be ensured that all the memory updates performed while the mutex is held are completed before the mutex is released. In the context of this discussion, the memory updates performed while the mutex was held must become visible to the rest of the system before the freeing of the mutex lock becomes visible.

To ensure that the correct order of operations is observed, it is sometimes necessary to use a memory barrier or memory fence. These stop the processor from executing further memory operations until the previous memory operations have completed. These instructions usually have different variants to enforce an order on different types of memory operations. In the case of the mutex lock release after a store operation, the memory barrier needs to enforce store ordering—no future store operations should become visible to the system until all proceeding stores have completed. Store ordering is the default for SPARC and x86 processors, so neither processor requires a memory barrier between two adjacent stores to enforce the order of the store operations.

It is less immediately obvious why loads have a similar constraint. Imagine that we have a reader-writer lock where a reader thread will only read data protected by the mutex and a writer thread can update those values. If a reader thread acquires the mutex, it wants to read those values while it still holds the mutex. The act of acquiring the mutex must complete before the load of the data protected by the mutex can start. A similar ordering constraint occurs when the mutex is freed. It is critical that the value of the variable protected by the mutex is read before the release of the mutex becomes visi-ble to the system.

Loads and stores that occur after the mutex has been released can be speculated so that they occur before the barrier. This is safe because access to these variables is not protected by the mutex and can be completed by either holding or not holding the mutex. This kind of barrier is called a release barrier.

A similar process must take place when a mutex is acquired. Memory operations that occur after the mutex has been acquired must not be visible to the system until after the mutex has been safely acquired. Again, memory operations that occur before the mutex is acquired can still be ongoing after the mutex has been acquired. This is referred to as an acquire barrier.

Returning to the spinlock that we implemented in Listing 8.2, we can now update this in Listing 8.7 to include the appropriate barrier calls.

Listing 8.7 Spinlock with Appropriate Memory Barriers

void lock_spinlock( volatile int* lock )

{

while ( CAS(lock, 0, 1) != 0 ) {} acquire_memory_barrier(); // Ensure that the CAS operation

// has become visible to the system

// before memory operations in the

// critical region start

}

void free_spinlock( volatile int * lock )

{

release_memory_barrier(); // Ensure that all past memory operations

// have become visible to the system

// before the following store starts

*lock = 0;

}

The kinds of memory barriers available are defined by the architecture. The x86 architecture defines the following:

ⁿ mfence. Ensures that all previous loads and stores are visible to the system before any future loads and stores become visible

ⁿ sfence. Ensures that all previous stores are visible to the system before any future stores become visible

ⁿ lfence. Ensures that all previous loads are visible to the system before any future loads become visible

The SPARC architecture defines a slightly finer set of memory barrier semantics. The instruction membar takes a combination of options to indicate the particular type of bar-rier required. The following types of memory barrier can be combined:

ⁿmembar #StoreStore. Ensures that all stores complete before the following stores

ⁿmembar #StoreLoad. Ensures that all stores complete before the following loads

ⁿmembar #LoadStore. Ensures that all loads complete before the following stores

ⁿmembar #LoadLoad. Ensures that all loads complete before the following loads

Modern SPARC and x86 processors implement a strong memory-ordering model. This means that memory-ordering operations are rarely needed. However, writing soft-ware that is safe for future processors where the memory-ordering constraints may have changed and older processors that implemented a weaker memory ordering requires that these instructions are included in the code. Processors that do not need the operations will typically ignore them and therefore get only minimal performance penalty.

On x86, the mfence instruction provides sufficient constraints on memory ordering for it to be used as both an acquire and a release barrier. On SPARC, it is sufficient to use membar #LoadLoad|#LoadStore to provide acquire barrier semantics to ensure that all previous loads have completed before any following memory operations. Release semantics are provided by membar #LoadStore|#StoreStore to ensure that all previ-ous memory operations have completed before the following store instruction.

On both SPARC and x86 processors, atomic operations enforce total memory order-ing; the atomic operation enforces an ordering between loads and older stores and loads and stores that are to be issued. Hence, in general, no memory barrier is required before or after an atomic operation.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Multicore Application Programming For Windows, Linux, and Oracle Solaris : Hand-Coded Synchronization and Sharing : Enforcing Memory Ordering to Ensure Correct Operation |