THREAD-SAFETY
Let’s look at another potential
problem that occurs in shared-memory programming: thread-safety. A block of code is thread-safe if it can be simultaneously
executed by multiple threads without causing problems.
As an example, suppose we
want to use multiple threads to “tokenize” a file. Let’s suppose that the file
consists of ordinary English text, and that the tokens are just contiguous
sequences of characters separated from the rest of the text by white
space—spaces, tabs, or newlines. A simple approach to this problem is to divide
the input file into lines of text and assign the lines to the threads in a
round-robin fashion: the first line goes to thread 0, the second goes to thread
1, . . . , the tth goes to thread t, the t
+ 1st goes to thread 0, and so on.
We’ll read the text into an
array of strings, with one line of text per string. Then we can use a parallel for directive with a schedule(static,1) clause to divide the lines among the
threads.
One way to tokenize a line is
to use the strtok function in string.h. It has the following prototype:
char strtok(
char* string /*in/out */,
const char* separators /* in */);
Its usage is a little
unusual: the first time it’s called, the string argument should be the text to be tokenized,
so in our example it should be the line of input. For subsequent calls, the
first argument should be NULL. The idea is that in the
first call, strtok caches a
pointer to string, and
for subsequent calls it returns successive tokens taken from the cached copy. The
characters that delimit tokens should be passed in separators, so we should pass in the string " ntnn" as the separators argument.
Program
5.6: A first attempt at a multi threaded tokenizer
Given these assumptions, we
can write the Tokenize function shown in Program
5.6. The main function has initialized the array lines so that it contains the input text, and line count is the number of strings stored in lines. Although for our purposes, we only need the lines argument to be an input argument, the strtok function modifies its input. Thus, when Tokenize returns, lines will be modified. When we run the program with
a single thread, it correctly tokenizes the input stream. The first time we run
it with two threads and the input
Pease porridge hot.
Pease porridge cold.
Pease porridge in the pot
Nine days old.
the output is also correct.
However, the second time we run it with this input, we get the following
output.
Thread 0 > line 0 = Pease porridge hot. Thread 1 > line 1 = Pease porridge cold. Thread 0 > token 0 = Pease
Thread 1 > token 0 = Pease
Thread 0 > token 1 = porridge
Thread 1 > token 1 = cold.
Thread 0 > line 2 = Pease porridge in the pot
Thread 1 > line 3 = Nine days old.
Thread 0 > token 0 = Pease
Thread 1 > token 0 = Nine
Thread 0 > token 1 = days
Thread 1 > token 1 = old.
What happened? Recall that strtok caches the input line. It does this by
declaring a variable to have static storage class. This causes
the value stored in this variable to persist from one call to the next.
Unfortunately for us, this cached string is shared, not private. Thus, it
appears that thread 1’s call to strtok with the second line has apparently
overwritten the contents of thread 0’s call with the first line. Even worse,
thread 0 has found a token (“days”) that should be in thread 1’s output.
The strtok function is therefore not thread-safe: if multiple
threads call it simultaneously, the output it produces may not be correct.
Regrettably, it’s not uncommon for C library functions to fail to be
thread-safe. For example, neither the random number generator random in stdlib.h nor the time conversion func-tion localtime in time.h is thread-safe. In some cases, the C standard
specifies an alternate, thread-safe, version of a function. In fact, there is a
thread-safe version of strtok:
The “ r” is supposed to suggest that the function is re-entrant, which is sometimes used as
a synonym for thread-safe. The first two arguments have the same purpose as the
arguments to strtok. The saveptr p argument is used by strtok r for keeping track of where the function is in
the input string; it serves the purpose of the cached pointer in strtok. We can correct our original Tokenize function by replacing the calls to strtok with calls to strtok r. We simply need to declare a char* variable to pass in for the third argument,
and replace the calls in Line 17 and Line 20 with the calls
my_token = strtok_r(lines[i],
" \t\n", &saveptr);
. . .
my_token = strtok_r(NULL,
" \t\n", &saveptr);
respectively.
1. Incorrect programs can produce
correct output
Notice that our original version of the
tokenizer program shows an especially insidi-ous form of program error: The
first time we ran it with two threads, the program produced correct output. It
wasn’t until a later run that we saw an error. This, unfortunately, is not a
rare occurrence in parallel programs. It’s especially common in shared-memory
programs. Since, for the most part, the threads are running inde-pendently of
each other, as we noted back at the beginning of the chapter, the exact
sequence of statements executed is nondeterministic. For example, we can’t say
when thread 1 will first call strtok. If its first call takes place after thread 0
has tokenized its first line, then the tokens identified for the first line should
be correct. However, if thread 1 calls strtok before thread 0 has finished tokenizing its
first line, it’s entirely possible that thread 0 may not identify all the
tokens in the first line, so it’s especially important in developing
shared-memory programs to resist the temptation to assume that since a program
produces correct output, it must be correct. We always need to be wary of race
conditions.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.