THREAD-SAFETY
Let’s look at another potential problem that
occurs in shared-memory programming: thread-safety. A block of code is thread-safe if it can be simultaneously executed by multiple threads without causing problems.
As an example, suppose we want to use multiple
threads to “tokenize” a file. Let’s suppose that the file consists of ordinary
English text, and that the tokens are just contiguous sequences of characters
separated from the rest of the text by white space—a space, a tab, or a
newline. A simple approach to this problem is to divide the input file into
lines of text and assign the lines to the threads in a round-robin fashion: the
first line goes to thread 0, the second goes to thread 1, . . . , the tth goes to thread t, the t + 1st goes to thread 0, and so on.
We can serialize access to the lines of input
using semaphores. Then, after a thread has read a single line of input, it can
tokenize the line. One way to do this is to use the strtok function in string.h, which has the following prototype:
char*
strtok(
char* string /* in/out */,
const char* separators /* in */);
Its usage is a little unusual: the first time
it’s called the string argument should be the text to be tokenized, so in our
example it should be the line of input. For subsequent calls, the first
argument should be NULL. The idea is that in the first call, strtok caches a pointer to string, and for subsequent calls it returns
successive tokens taken from the cached copy. The characters that delimit
tokens should be passed in separators. We should pass in the string " /t/n" as the separators argument.
Program 4.14: A first attempt at a multithreaded tokenizer
void Tokenize(void
rank) {
long
my_rank = (long) rank;
int
count;
int
next = (my_rank + 1) % thread_count;
char *fg_rv;
char
my_line[MAX];
char
* my_string;
sem_wait(&sems[my_rank]);
fg_rv
= fgets(my_line, MAX, stdin);
sem_post(&sems[next]);
while
(fg_rv != NULL) f
printf("Thread
%ld > my_line = %s", my_rank, my_line);
count
= 0;
my_string
= strtok(my_line, " \t\n");
while
( my_string != NULL ) f
count++;
printf("Thread
%ld > string %d = %snn", my_rank, count,
my_string);
my_string
= strtok(NULL, " \t\n");
}
sem_wait(&sems[my
rank]);
fg_rv
= fgets(my_line, MAX, stdin);
sem_post(&sems[next]);
}
return
NULL;
} /*
Tokenize */
Given these assumptions, we can write the
thread function shown in Pro-gram 4.14. The main thread has initialized an
array of t semaphores—one for each thread. Thread 0’s
semaphore is initialized to 1. All the other semaphores are initialized to 0.
So the code in Lines 9 to 11 will force the threads to sequentially access the
lines of input. Thread 0 will immediately read the first line, but all the
other threads will block in sem wait. When thread 0 executes the sem post, thread 1 can read a line of input. After each thread has read its
first line of input (or end-of-file), any additional input is read in Lines 24
to 26. The fgets function reads a single line of input and Lines 15 to 22 identify
the tokens in the line. When we run the program with a single thread, it
correctly tokenizes the input stream. The first time we run it with two threads
and the input
Pease porridge hot.
Pease porridge cold.
Pease porridge in the pot
Nine days old.
the output is also correct. However, the second
time we run it with this input, we get the following output.
Thread 0 > my
line = Pease porridge hot.
Thread 0 > string
1 = Pease
Thread 0 > string
2 = porridge
Thread 0 > string
3 = hot.
Thread 1 > my
line = Pease porridge cold.
Thread 0 > my
line = Pease porridge in the pot
Thread 0 > string
1 = Pease
Thread 0 > string
2 = porridge
Thread 0 > string
3 = in
Thread 0 > string
4 = the
Thread 0 > string
5 = pot
Thread 1 > string
1 = Pease
Thread 1 > my line = Nine days old.
Thread 1 > string 1 = Nine
Thread 1 > string 2 = days
Thread 1 > string 3 = old.
What happened? Recall that strtok caches the input line. It does this by declaring a variable to
have static storage class. This causes the value stored in
this variable to persist from one call to the next. Unfortunately for us, this
cached string is shared, not private. Thus, thread 0’s call to strtok with the third line of the input has apparently overwritten the
contents of thread 1’s call with the second line.
The strtok function is not thread-safe: if multiple threads call it simultaneously, the
output it produces may not be correct. Regrettably, it’s not uncommon for C
library functions to fail to be thread-safe. For example, neither the random
num-ber generator random in stdlib.h nor the time conversion function localtime in time.h is thread-safe. In some cases, the C standard
specifies an alternate, thread-safe version of a function. In fact, there is a
thread-safe version of strtok:
char*
strtok_r(
char* string /* in/out */,
const char* separators /* in */,
char** saveptr_p /* in/out */);
The “_r” is supposed to suggest that the function is reentrant, which is sometimes used as a synonym for
thread-safe. The first two arguments have the same purpose as the arguments to strtok. The saveptr Append ‘‘_p’’ to ‘‘saveptr’’ argument is used by strtok r for keeping track of where the function is in the input string; it
serves the purpose of the cached pointer in strtok. We can correct our original Tokenize function by replacing the calls to
strtok with calls to
strtok r. We sim-ply need to declare a char variable to pass in for the third argument,
and replace the calls in Line 16 and Line 21 with the calls
my_string = strtok_r(my_line, "
\t\n", &saveptr);
. . .
my_string = strtok_r(NULL, " \t\n",
&saveptr);
respectively.
1. Incorrect programs can produce
correct output
Notice that our original version of the
tokenizer program shows an especially insid-ious form of program error: the
first time we ran it with two threads, the program produced correct output. It
wasn’t until a later run that we saw an error. This, unfor-tunately, is not a
rare occurrence in parallel programs. It’s especially common in shared-memory
programs. Since, for the most part, the threads are running indepen-dently of
each other, as we noted earlier, the exact sequence of statements executed is
nondeterministic. For example, we can’t say when thread 1 will first call strtok. If its first call takes place after thread 0 has tokenized its
first line, then the tokens identified for the first line should be correct.
However, if thread 1 calls strtok before thread 0 has finished tokenizing its
first line, it’s entirely possible that thread 0 may not identify all the
tokens in the first line. Therefore, it’s especially important in developing
shared-memory programs to resist the temptation to assume that since a program
produces correct output, it must be correct. We always need to be wary of race
conditions.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.