Reordering
of Operations by the Compiler
Although the hardware can reorder operations, it is also possible for
the compiler to do the same. Although the compiler will try to do the “right thing,”
there will be situations where it needs hints in order to produce code that
runs correctly.
It is often thought that the volatile keyword provides a safety net that stops the compiler from optimizing
code. Unfortunately, this is not true. The volatile keyword determines that the compiler needs to reload data from memory
when it needs to use it and should store it back to memory as soon as it is
modified. It does not stop the com-piler from performing optimizations around
the data nor does it form any kind of pro-tection against data races.
However, the volatile keyword is necessary to avoid the compiler optimizing away accesses to
a variable. Consider the code in Listing 8.8. The variable start is declared as volatile. If this
were not the case, then the compiler would determine that the loop
was either not entered or entered and infinite.
Listing 8.8 Code Where volatile Keyword Is Necessary to Ensure Reloading of Variable start
volatile int start;
void waitforstart()
{
while( start == 0 ) {}
}
A very similar situation exists with the code in Listing 8.9. In this
case, the code con-tains a function call. Function calls, generally, cause the
compiler to reload global data. However, if the variable count is not declared to be volatile, it is still possible for the compiler
to generate an infinite loop. The reason for this is that if count does equal zero and it is not a volatile variable, then the call to getElementFromList() will not be made, and the variable will remain zero. An optimizing
compiler may identify this situa-tion and replace the else branch of the if statement with an infinite loop.
Listing 8.9 Function
Calls Do Not Always Enforce Reloading of All Variables
volatile int count;
int* getnextelement()
{
int element = 0; while( element == 0 )
{
if (count>0)
{
element = getElementFromList();
}
}
return element;
}
The code shown in Listing 8.10 demonstrates a
situation where the compiler can merge two loops, resulting in a change of
behavior. The code creates four threads. Each thread runs the same routine that
prints out a statement that the thread is ready and then waits on the volatile variable start. When
the thread is started, it prints out a mes-sage before completing a short
amount of work, printing a message indicating that it has completed its work
and exiting.
The main code creates the threads and, once all the
threads are created, signals to the threads that they can start working. Once
this has been done, it waits for all the threads to complete their work, before
calling pthread_join() on each
of the threads.
Listing 8.10 Code
Where the Compiler May Reorder Operations
#include <stdio.h> #include <pthread.h>
volatile int start[4]; volatile int done[4];
void
* work( void* param )
{
int id = (int)param;
while ( start[id] == 0 ) {}
printf( "Thread %i started\n", id ); double total=0;
for ( int i=0; i<100000000; i++ ) { total += i; } printf(
"Thread %i done\n", id );
done[id] = 1;
}
int
main()
{
pthread_t thread[4];
for ( int i=0; i<4; i++ )
{
pthread_create( &thread[i], 0, work, (void*)i ); done[i]=0;
}
for ( int
i=0; i<4; i++ )
{
start[i] = 1;
}
for ( int
i=0; i<4; i++ )
{
while(
done[i] == 0 ){}
}
for ( int i=0; i<4; i++ )
{
pthread_join( thread[i], 0 );
}
}
Listing 8.11 shows the output from the code with
and without optimization. There is a pronounced difference between the codes.
When compiled without optimization, all the threads proceed at the same time,
so the output from the application shows all the threads starting and then all
the threads completing. When compiled with optimization, the threads are
serialized, so each thread prints the message that it has started followed by
the message that it has completed.
Listing 8.11 Output
Without and with Optimization
$ cc loop_merge.c
$ ./a.out
Thread 1 started
Thread 2 started
Thread 0 started
Thread 3 started
Thread 1 done
Thread 0 done
Thread 2 done
Thread 3 done
$ cc -O
loop_merge.c $ ./a.out
Thread 0 started
Thread 0 done
Thread 1 started
Thread 1 done
Thread 2 started
Thread 2 done
Thread 3 started
Thread 3 done
With optimization, the compiler has merged the code in the two loops
that set the start variable
and read the end variable.
This produces code similar to that shown in Listing 8.12. The two loops contain no function calls, and the compiler
considers the accesses to the volatile variables to be without side effects.
Hence, the compiler considers it safe to merge the two loops.
Listing 8.12 Compiler-Optimized
Code
int
main()
{
pthread_t thread[4]; for( int i=0; i<4; i++ )
{
pthread_create( &thread[i], 0, work, (void*)i ); done[i] = 0;
}
for ( int
i=0; i<4; i++ )
{
start[i] = 1;
while(
done[i] == 0 ){}
}
for ( int i=0; i<4;i++ )
{
pthread_join( thread[i], 0 );
}
}
To correct this situation, we need to modify the
code so that the compiler is unable to merge the two loops. The easiest way to
do this is to place a function call either in one of the loops or between the
two loops. An alternative approach would be to separate the two loops with
serial code that is unable to be moved because of some dependency or potential
aliasing issue. However, both of these approaches will add some unnecessary
instructions into the code, and both approaches run the risk that a “smart”
compiler might identify the instructions as unnecessary and remove them.
There is a gcc asm("":::"memory") construct, supported by multiple compilers, that can be used to cause
the compiler to correctly order the loops. Listing 8.13 shows the code modified
to use this statement. This statement stops the loops from merging and adds no
additional instructions into the code.
Listing 8.13 Using
gcc asm() Statement to Cause Correct
Operation Ordering
int
main()
{
pthread_t thread[4];
for ( int i=0; i<4; i++ )
{
pthread_create( &thread[i], 0, work, (void*)i ); done[i] = 0;
}
for ( int i=0; i<4; i++ )
{
start[i] = 1;
}
asm volatile(
"": : : "memory" );
for ( int i=0; i<4; i++ )
{
while( done[i] == 0 ){}
}
for ( int i=0; i<4;i++ )
{
pthread_join( thread[i], 0 );
}
}
Windows provides intrinsics for this purpose. The
functions _ReadBarrier(),
_WriteBarrier(), and
_ReadWriteBarrier() are defined in
<intrin.h>. These
intrinsics tell the compiler not to reorder the
operations. A _ReadBarrier() call
ensures that all reads will have completed at that point, while a _WriteBarrier() call ensures that all writes have completed. These instructions only
enforce the compiler ordering and do not generate any instructions.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.