BUSY-WAITING
When, say, thread 0 wants to execute the
statement x = x + y, it needs to first make sure that thread 1 is not already
executing the statement. Once thread 0 makes sure of this, it needs to provide
some way for thread 1 to determine that it, thread 0, is executing the
statement, so that thread 1 won’t attempt to start executing the statement
until thread 0 is done. Finally, after thread 0 has completed execution of the
statement, it needs to provide some way for thread 1 to determine that it is
done, so that thread 1 can safely start executing the statement.
A simple approach that doesn’t involve any new
concepts is the use of a flag variable. Suppose flag is a shared int that is set to 0 by the main thread. Further, suppose we add the
following code to our example:
y = Compute(my_rank);
while (flag != my_rank);
x = x + y;
flag++;
Let’s suppose that thread 1 finishes the
assignment in Line 1 before thread 0. What happens when it reaches the while statement in Line 2? If you look at the while statement for a minute, you’ll see that it has
the somewhat peculiar property that its body is empty. So if the test flag != my rank is true, then thread 1 will just execute the test a second time.
In fact, it will keep re-executing the test until the test is false. When the
test is false, thread 1 will go on to execute the code in the critical section
x = x + y.
Since we’re assuming that the main thread has
initialized flag to 0, thread 1 won’t proceed to the critical section in Line 3
until thread 0 executes the statement flag++. In fact, we see that unless some catastrophe
befalls thread 0, it will eventually catch up to thread 1. However, when thread
0 executes its first test of flag != my rank, the condition is false, and it will go on to
execute the code in the critical section x = x +
y. When it’s done with this, we see that it will
execute flag++, and thread 1 can finally enter the critical section.
The key here is that thread 1 cannot enter the critical
section until thread 0
has completed the execution
of flag++. And, provided the statements are executed exactly as they’re written, this means that
thread 1 cannot enter the critical section until thread 0 has completed it.
The while loop is an example of busy-waiting. In busy-waiting, a thread repeat-edly tests a condition, but,
effectively, does no useful work until the condition has the appropriate value
(false in our example).
Note that we said that the busy-wait solution
would work “provided the statements are executed exactly as they’re written.”
If compiler optimization is turned on, it is possible that the compiler will make changes that will affect the
correctness of busy-waiting. The reason for this is that the compiler is unaware
that the program is multithreaded, so it doesn’t “know” that the variables x and flag can be modified by another thread. For
example, if our code
y = Compute(my_rank); while (flag != my_rank); x
= x + y;
flag++;
is run by just one thread, the order of the
statements while (flag != my rank) and x = x + y is unimportant. An optimizing compiler might
therefore determine that the program would make better use of registers
if the order of the statements were switched. Of course, this will result in
the code
y = Compute(my_rank); x = x + y;
while (flag != my_rank); flag++;
which defeats the purpose of the busy-wait
loop. The simplest solution to this problem is to turn compiler optimizations
off when we use busy-waiting. For an alternative to completely turning off
optimizations, see Exercise 4.3.
We can immediately see that busy-waiting is not
an ideal solution to the problem of controlling access to a critical section.
Since thread 1 will execute the test over and over until thread 0 executes flag++, if thread 0 is delayed (for example, if the operating system
preempts it to run something else), thread 1 will simply “spin” on the test,
eating up CPU cycles. This can be positively disastrous for performance.
Turning off compiler optimizations can also seriously degrade performance.
Before going on, though, let’s return to our
calculation program in Figure 4.3 and correct it by using busy-waiting. The
critical section in this function is Line 15. We can therefore precede this
with a busy-wait loop. However, when a thread is done with the critical
section, if it simply increments flag, eventually flag will be greater than t, the number of threads, and none of the
threads will be able to return to the critical section. That is, after
executing the critical section once, all the threads will be stuck forever in
the busy-wait loop. Thus, in this instance, we don’t want to simply increment flag. Rather, the last thread, thread t 1, should reset flag to zero. This can be accomplished by replacing
flag++ with
flag = (flag + 1) % thread count;
With this change, we get the thread function
shown in Program 4.4. If we compile the program and run it with two threads, we
see that it is computing the correct results. However, if we add in code for
computing elapsed time, we see that when n D 108, the serial sum is consistently
faster than the parallel sum. For example, on the dual-core system, the elapsed time for the sum as computed by
two threads is about 19.5 seconds, while the elapsed time for the serial sum is
about 2.8 seconds!
Program 4.4: Pthreads global sum with busy-waiting
void Thread_sum(void*
rank) {
long
my_rank = (long) rank;
double
factor;
long
long i;
long
long_my_n = n/thread_count;
long
long my_first_i = my_n* my_rank;
long
long my_last_i = my_first_i + my_n;
if
(my_first_i %
2 == 0)
factor
= 1.0;
else
factor
= - 1.0;
for
(i = my_first_i; i < my_last_i; i++, factor =- factor) {
while
(flag != my_rank);
sum
+= factor/(2* i+1);
flag
= (flag+1) % thread_count;
}
return
NULL;
} /*
Thread sum */
Why is this? Of course, there’s overhead
associated with starting up and joining the threads. However, we can estimate
this overhead by writing a Pthreads program in which the thread function simply
returns:
void* Thread_function(void* ignore) { return NULL;
} /* Thread function */
When we find the time that’s elapsed between
starting the first thread and joining the second thread, we see that on this
particular system, the overhead is less than 0.3 milliseconds, so the slowdown
isn’t due to thread overhead. If we look closely at the thread function that
uses busy-waiting, we see that the threads alternate between executing the
critical section code in Line 16. Initially flag is 0, so thread 1 must wait until thread 0
executes the critical section and increments flag. Then, thread 0 must wait until thread 1
executes and increments. The threads will alternate between waiting and
executing, and evidently the waiting and the incrementing increase the overall
run time by a factor of seven.
Program 4.5: Global sum function with critical section after
loop
void* Thread_sum(void* rank) { long my_rank = (long) rank; double factor, my_sum = 0.0;
long long i;
long long my_n = n/thread_count; long long_my_first i = my_n *my_rank;
long long my_last_i = my_first_i + my_n;
if (my_first i % 2 == 0) factor = 1.0;
else
factor = -1.0;
for (i = my_first i; i < my_last i; i++, factor = factor) my_sum += factor/(2* i+1);
while (flag != my_rank); sum += my_sum;
flag = (flag+1) % thread_count;
return NULL;
} /* Thread sum
*/
As we’ll see, busy-waiting isn’t the only
solution to protecting a critical section. In fact, there are much better
solutions. However, since the code in a critical section can only be executed
by one thread at a time, no matter how we limit access to the critical section,
we’ll effectively serialize the code in the critical section. Therefore, if
it’s at all possible, we should minimize the number of times we execute
critical section code. One way to greatly improve the performance of the sum
function is to have each thread use a private variable to store its total contribution to the sum. Then, each
thread can add in its contribution to the global sum once, after the for loop. See Program 4.5. When we run this on the dual-core system
with n = 108, the elapsed time is reduced to
1.5 seconds for two threads, a substantial improvement.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.