Enforcing
Memory Ordering to Ensure Correct Operation
In the previous section, we looked at the implementation of a simple
spinlock. Unfortunately, there is the potential for more implementation
complexity. When writing any multithreaded code, it is important to consider
the memory-ordering characteristics of the system. When a thread running on a
multithread system performs a memory opera-tion, that operation may or may not
become visible to the rest of the system in the order in which it occurred.
For example, an application might perform two store
operations. On some processors, the second store operation could become visible
to other processors before the first store operation becomes visible. This is
called weak memory ordering.
Consider this in the context of a mutex lock. The
application updates a variable and stores it back to memory. It then frees the
lock by writing a zero into the lock. If the system implements a weak memory
ordering, the store to free the lock might become visible to other processors
in the system before the store to update the data. Another processor that was
waiting on the lock might see that the lock is free and read the value of the
variable protected by the lock before the update of that variable is visible.
To stop this kind of error, it may be necessary to
include instructions that enforce the desired memory ordering. Determining
whether a particular processor requires these instructions typically requires a
careful read of the documentation. However, code that assumes that these
instructions are necessary will run correctly on hardware that does not need
the instructions, whereas the converse is not true.
First, we’ll consider the release of a mutex lock
in more detail. The release operation is performed by storing a zero to the
lock variable. The code executed while the mutex lock is held will update
memory. It must be ensured that all the memory updates performed while the
mutex is held are completed before the mutex is released. In the context of
this discussion, the memory updates performed while the mutex was held must
become visible to the rest of the system before the freeing of the mutex lock
becomes visible.
To ensure that the correct order of operations is observed, it is
sometimes necessary to use a memory
barrier or memory fence. These
stop the processor from executing further memory operations until the previous
memory operations have completed. These instructions usually have different
variants to enforce an order on different types of memory operations. In the
case of the mutex lock release after a store operation, the memory barrier
needs to enforce store ordering—no future store operations should become
visible to the system until all proceeding stores have completed. Store
ordering is the default for SPARC and x86 processors, so neither processor
requires a memory barrier between two adjacent stores to enforce the order of
the store operations.
It is less immediately obvious why loads have a similar constraint.
Imagine that we have a reader-writer lock where a reader thread will only read
data protected by the mutex and a writer thread can update those values. If a
reader thread acquires the mutex, it wants to read those values while it still
holds the mutex. The act of acquiring the mutex must complete before the load
of the data protected by the mutex can start. A similar ordering constraint
occurs when the mutex is freed. It is critical that the value of the variable
protected by the mutex is read before the release of the mutex becomes visi-ble
to the system.
Loads and stores that occur after the mutex has been released can be
speculated so that they occur before the barrier. This is safe because access
to these variables is not protected by the mutex and can be completed by either
holding or not holding the mutex. This kind of barrier is called a release barrier.
A similar process must take place when a mutex is acquired. Memory
operations that occur after the mutex has been acquired must not be visible to
the system until after the mutex has been safely acquired. Again, memory
operations that occur before the mutex is acquired can still be ongoing after
the mutex has been acquired. This is referred to as an acquire barrier.
Returning to the spinlock that we implemented in Listing 8.2, we can now
update this in Listing 8.7 to include the appropriate barrier calls.
Listing 8.7 Spinlock with Appropriate Memory Barriers
void lock_spinlock( volatile int* lock )
{
while ( CAS(lock, 0, 1) != 0 ) {}
acquire_memory_barrier(); // Ensure that the CAS operation
// has
become visible to the system
// before
memory operations in the
// critical
region start
}
void free_spinlock( volatile int * lock )
{
release_memory_barrier(); //
Ensure that all past memory operations
// have
become visible to the system
// before
the following store starts
*lock = 0;
}
The kinds of memory barriers available are defined
by the architecture. The x86 architecture defines the following:
n mfence. Ensures that all previous loads and stores are visible to the system
before any future loads and stores
become visible
n sfence. Ensures that all previous stores are visible to the system before any
future stores become visible
n lfence. Ensures that all previous loads are visible to the system before any
future loads become visible
The SPARC architecture defines a slightly finer set
of memory barrier semantics. The instruction membar takes a combination of options to indicate the particular type of
bar-rier required. The following types of memory barrier can be combined:
n membar
#StoreStore. Ensures that all stores
complete before the following stores
n membar
#StoreLoad. Ensures that all stores
complete before the following loads
n membar
#LoadStore. Ensures that all loads complete
before the following stores
n membar
#LoadLoad. Ensures that all loads complete
before the following loads
Modern SPARC and x86 processors implement a strong
memory-ordering model. This means that memory-ordering operations are rarely
needed. However, writing soft-ware that is safe for future processors where the
memory-ordering constraints may have changed and older processors that
implemented a weaker memory ordering requires that these instructions are
included in the code. Processors that do not need the operations will typically
ignore them and therefore get only minimal performance penalty.
On x86, the mfence instruction provides sufficient constraints on memory ordering for it
to be used as both an acquire and a release barrier. On SPARC, it is sufficient
to use membar #LoadLoad|#LoadStore to provide acquire barrier semantics to ensure that all previous loads
have completed before any following memory operations. Release semantics are
provided by membar #LoadStore|#StoreStore to ensure that all previ-ous memory operations have completed before
the following store instruction.
On both SPARC and x86 processors, atomic operations
enforce total memory order-ing; the atomic operation enforces an ordering
between loads and older stores and loads and stores that are to be issued.
Hence, in general, no memory barrier is required before or after an atomic
operation.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.