Chapter: An Introduction to Parallel Programming : Parallel Hardware and Parallel Software

Input and Output

INPUT AND OUTPUT

We’ve generally avoided the issue of input and output. There are a couple of reasons. First and foremost, parallel I/O, in which multiple cores access multiple disks or other devices, is a subject to which one could easily devote a book. See, for example, [35]. Second, the vast majority of the programs we’ll develop do very little in the way of I/O. The amount of data they read and write is quite small and easily managed by the standard C I/O functions printf, fprintf, scanf, and fscanf. However, even the limited use we make of these functions can potentially cause some problems. Since these functions are part of standard C, which is a serial language, the standard says nothing about what happens when they’re called by dif-ferent processes. On the other hand, threads that are forked by a single process do share stdin, stdout, and stderr. However, (as we’ve seen), when multiple threads attempt to access one of these, the outcome is nondeterministic, and it’s impossible to predict what will happen.

When we call printf from multiple processes, we, as developers, would like the output to appear on the console of a single system, the system on which we started the program. In fact, this is what the vast majority of systems do. However, there is no guarantee, and we need to be aware that it is possible for a system to do something else, for example, only one process has access to stdout or stderr or even no processes have access to stdout or stderr.

What should happen with calls to scanf when we’re running multiple processes is a little less obvious. Should the input be divided among the processes? Or should only a single process be allowed to call scanf? The vast majority of systems allow at least one process to call scanf—usually process 0—while some allow more pro-cesses. Once again, there are some systems that don’t allow any processes to call scanf.

When multiple processes can access stdout, stderr, or stdin, as you might guess, the distribution of the input and the sequence of the output are usually nonde-terministic. For output, the data will probably appear in a different order each time the program is run, or, even worse, the output of one process may be broken up by the output of another process. For input, the data read by each process may be different on each run, even if the same input is used.

In order to partially address these issues, we’ll be making these assumptions and following these rules when our parallel programs need to do I/O:

. In distributed-memory programs, only process 0 will access stdin. In shared-memory programs, only the master thread or thread 0 will access stdin.

. In both distributed-memory and shared-memory programs, all the processes/ threads can access stdout and stderr.

. However, because of the nondeterministic order of output to stdout, in most cases only a single process/thread will be used for all output to stdout. The principal exception will be output for debugging a program. In this situation, we’ll often have multiple processes/threads writing to stdout.

. Only a single process/thread will attempt to access any single file other than stdin, stdout, or stderr. So, for example, each process/thread can open its own, private file for reading or writing, but no two processes/threads will open the same file.

. Debug output should always include the rank or id of the process/thread that’s generating the output.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

An Introduction to Parallel Programming : Parallel Hardware and Parallel Software : Input and Output |