INPUT AND OUTPUT
We’ve generally avoided the issue of input and
output. There are a couple of reasons. First and foremost, parallel I/O, in
which multiple cores access multiple disks or other devices, is a subject to
which one could easily devote a book. See, for example, . Second, the vast
majority of the programs we’ll develop do very little in the way of I/O. The
amount of data they read and write is quite small and easily managed by the
standard C I/O functions printf, fprintf, scanf, and fscanf. However, even the limited use we make of
these functions can potentially cause some problems. Since these functions are
part of standard C, which is a serial language, the standard says nothing about
what happens when they’re called by dif-ferent processes. On the other hand,
threads that are forked by a single process do share stdin, stdout, and stderr. However, (as we’ve seen), when multiple
threads attempt to access one of these, the outcome is nondeterministic, and
it’s impossible to predict what will happen.
When we call printf from multiple processes, we, as developers,
would like the output to appear on the console of a single system, the system
on which we started the program. In fact, this is what the vast majority of
systems do. However, there is no guarantee, and we need to be aware that it is
possible for a system to do something else, for example, only one process has
access to stdout or stderr or even no processes have access to stdout or stderr.
What should happen with calls to scanf when we’re running multiple processes is a
little less obvious. Should the input be divided among the processes? Or should
only a single process be allowed to call scanf? The vast majority of systems allow at least
one process to call scanf—usually process 0—while some allow more
pro-cesses. Once again, there are some systems that don’t allow any processes
to call scanf.
When multiple processes can access stdout, stderr, or stdin, as you might guess, the distribution of the input
and the sequence of the output are usually nonde-terministic. For output, the
data will probably appear in a different order each time the program is run,
or, even worse, the output of one process may be broken up by the output of
another process. For input, the data read by each process may be different on
each run, even if the same input is used.
In order to partially address these issues,
we’ll be making these assumptions and following these rules when our parallel
programs need to do I/O:
. In distributed-memory programs, only process
0 will access stdin. In shared-memory programs, only the master thread or thread 0
will access stdin.
. In both distributed-memory and shared-memory
programs, all the processes/ threads can access stdout and stderr.
. However, because of the nondeterministic
order of output to stdout, in most cases only a single process/thread
will be used for all output to stdout. The principal exception will be output for
debugging a program. In this situation, we’ll often have multiple
processes/threads writing to stdout.
. Only a single process/thread will attempt to
access any single file other than stdin, stdout, or stderr. So, for example, each process/thread can open
its own, private file for reading or writing, but no two processes/threads
will open the same file.
. Debug output should always include the rank
or id of the process/thread that’s generating the output.