Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Hardware, Processes, and Threads

Ensuring the Correct Order of Memory Operations

There is one more concern to discuss when dealing with systems that contain multiple processors or multiple cores: memory ordering.

Ensuring the Correct Order of Memory Operations

There is one more concern to discuss when dealing with systems that contain multiple processors or multiple cores: memory ordering. Memory ordering is the order in which memory operations are visible to the other processors in the system. Most of the time, the processor does the right thing without any need for the programmer to do anything.

However, there are situations where the programmer does need to step in. These can be either architecture specific (SPARC processors and x86 processors have different requirements) or implementation specific (one type of SPARC processor may have dif-ferent needs than another type of SPARC processor). The good news is that the system libraries implement the appropriate mechanisms, so multithreaded applications that use system libraries should never encounter this.

On the other hand, there is some overhead from calling system libraries, so there could well be a performance motivation for writing custom synchronization code. This situation is covered in Chapter 8, “Hand-Coded Synchronization and Sharing.”

The memory ordering instructions are given the name memory barriers (membar) on SPARC and memory fences (mfence) on x86. These instructions stop memory opera-tions from becoming visible outside the thread in the wrong order. The following exam-ple will illustrate why this is important.

Suppose you have a variable, count, protected by a locking mechanism and you want to increment that variable. The lock works by having the value 1 stored into it when it is acquired and then the value 0 stored into it when the lock is released. The code for acquiring the lock is not relevant to this example, so the example starts with the assump-tion that the lock is already acquired, and therefore the variable lock contains the value 1. Now that the lock is acquired, the code can increment the variable count. Then, to release the lock, the code would store the value 0 into the variable lock. The process of incrementing the variable and then releasing the lock with a store of the value 0 would look something like the pseudocode shown in Listing 1.6.

Listing 1.6 Incrementing a Variable and Freeing a Lock

LOAD [&count], %A

INC %A

STORE %A, [&count]

STORE 0, [&lock]

As soon as the value 0 is stored into the variable lock, then another thread can come along to acquire the lock and modify the variable count. For performance reasons, some processors implement a weak ordering of memory operations, meaning that stores can be moved past other stores or loads can be moved past other loads. If the previous code is run on a machine with a weaker store ordering, then the code at execution time could look like the code shown in Listing 1.7.

Listing 1.7 Incrementing and Freeing a Lock Under Weak Memory Ordering

LOAD [&count], %A

INC %A

STORE 0, [&lock]

STORE %A, [&count]

At runtime, the processor has hoisted the store to the lock so that it becomes visible to the rest of the system before the store to the variable count. Hence, the lock is released before the new value of count is visible. Another processor could see that the lock was free and load up the old value of count rather than the new value.

The solution is to place a memory barrier between the two stores to tell the processor not to reorder them. Listing 1.8 shows the corrected code. In this example, the membar instruction ensures that all previous store operations have completed before the next store instruction is executed.

Listing 1.8 Using a Memory Bar to Enforce Store Ordering

LOAD [&count], %A

INC %A

STORE %A, [&count]

MEMBAR #store, #store

STORE 0, [&lock]

There are other types of memory barriers to enforce other orderings of load and store operations. Without these memory barriers, other memory ordering errors could occur. For example, a similar issue could occur when the lock is acquired. The load that fetches the value of count might be executed before the store that sets the lock to be acquired. In such a situation, it would be possible for another processor to modify the value of count between the time that the value was retrieved from memory and the point at which the lock was acquired.

The programmer’s reference manual for each family of processors will give details about the exact circumstances when memory barriers may or may not be required, so it is essential to refer to these documents when writing custom locking code.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Multicore Application Programming For Windows, Linux, and Oracle Solaris : Hardware, Processes, and Threads : Ensuring the Correct Order of Memory Operations |