Chapter: An Introduction to Parallel Programming : Distributed-Memory Programming with MPI

Dealing with I/O

Of course, the current version of the parallel trapezoidal rule has a serious deficiency: it will only compute the integral over the interval [0, 3] using 1024 trapezoids.

DEALING WITH I/O

Of course, the current version of the parallel trapezoidal rule has a serious deficiency: it will only compute the integral over the interval [0, 3] using 1024 trapezoids. We can edit the code and recompile, but this is quite a bit of work compared to simply typing in three new numbers. We need to address the problem of getting input from the user. While we’re talking about input to parallel programs, it might be a good idea to also take a look at output. We discussed these two issues in Chapter 2, so if you remember the discussion of nondeterminism and output, you can skip ahead to Section 3.3.2.

1. Output

In both the “greetings” program and the trapezoidal rule program we’ve assumed that process 0 can write to stdout, that is, its calls to printf behave as we might expect. Although the MPI standard doesn’t specify which processes have access to which I/O devices, virtually all MPI implementations allow all the processes in MPI_COMM_WORLD full access to stdout and stderr, so most MPI implementations allow all processes to execute printf and fprintf(stderr, ...).

However, most MPI implementations don’t provide any automatic scheduling of access to these devices. That is, if multiple processes are attempting to write to, say, stdout, the order in which the processes’ output appears will be unpredictable. Indeed, it can even happen that the output of one process will be interrupted by the output of another process.

int main(void) f

int my_rank, comm._sz, n = 1024, local_n;

double a = 0.0, b = 3.0, h, local_a, local_b;

double local_int, total_int;

int source;

MPI_Init(NULL, NULL);

MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

MPI_Comm_size(MPI_COMM_WORLD, &comm._sz);

h = (b-a)/n; / h is the same for all processes /

local_n = n/comm._sz; / So is the number of trapezoids /

local_a = a + my_rank * local_n*h;

local_b = local_a + local_n*h;

local_int = Trap(local_a, local_b, local_n, h);

if (my rank != 0) f

MPI_Send(&local int, 1, MPI_DOUBLE, 0, 0,

MPI_COMM_WORLD);

} else {

total_int = local_int;

for (source = 1; source < comm_sz; source++) {

MPI_Recv(&local int, 1, MPI_DOUBLE, source, 0,

MPI_COMM_WORLD, MPI_STATUS_IGNORE);

total_int += local_int;

if (my_rank == 0) {

printf("With n = %d trapezoids, our estimate\n", n);

printf("of the integral from %f to %f = %.15e\n",

a, b, total_int);

MPI_Finalize();

return 0;

} / main /

For example, suppose we try to run an MPI program in which each process simply prints a message. See Program 3.4. On our cluster, if we run the program with five processes, it often produces the “expected” output:

Proc 0 of 5 > Does anyone have a toothpick?

Proc 1 of 5 > Does anyone have a toothpick?

Proc 2 of 5 > Does anyone have a toothpick?

double Trap(

double left_endpt /* in */,

double right_endpt / * in */,

int trap_count /* in */,

double base_len /* in */) {

double estimate, x;

int i;

estimate = (f(left endpt) + f(right endpt))/2.0;

for (i = 1; i <= trap_count 1; i++) {

x = left_endpt + i*base_len;

estimate += f(x);

estimate = estimate * base_len;

return estimate;

} / Trap /

Program 3.3: Trap function in the MPI trapezoidal rule

#include <stdio.h>

#include <mpi.h>

int main(void) {

int my_rank, comm._sz;

MPI_Init(NULL, NULL);

MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

printf("Proc %d of %d > Does anyone have a toothpick?\n", my_rank, comm_sz);

MPI_Finalize();

return 0;

}/* main */

Program 3.4: Each process just prints a message

Proc 3 of 5 > Does anyone have a toothpick?

Proc 4 of 5 > Does anyone have a toothpick?

However, when we run it with six processes, the order of the output lines is unpredictable:

Proc 0 of 6 > Does anyone have a toothpick? Proc 1 of 6 > Does anyone have a toothpick? Proc 2 of 6 > Does anyone have a toothpick?

Proc 5 of 6 > Does anyone have a toothpick?

Proc 3 of 6 > Does anyone have a toothpick?

Proc 4 of 6 > Does anyone have a toothpick?

Proc 0 of 6 > Does anyone have a toothpick?

Proc 1 of 6 > Does anyone have a toothpick?

Proc 2 of 6 > Does anyone have a toothpick?

Proc 4 of 6 > Does anyone have a toothpick?

Proc 3 of 6 > Does anyone have a toothpick?

Proc 5 of 6 > Does anyone have a toothpick?

The reason this happens is that the MPI processes are “competing” for access to the shared output device, stdout, and it’s impossible to predict the order in which the processes’ output will be queued up. Such a competition results in nondeterminism. That is, the actual output will vary from one run to the next.

In any case, if we don’t want output from different processes to appear in a random order, it’s up to us to modify our program accordingly. For example, we can have each process other than 0 send its output to process 0, and process 0 can print the output in process rank order. This is exactly what we did in the “greetings” program.

2. Input

Unlike output, most MPI implementations only allow process 0 in MPI_COMM WORLD access to stdin. This makes sense: If multiple processes have access to stdin, which process should get which parts of the input data? Should process 0 get the first line? Process 1 the second? Or should process 0 get the first character?

In order to write MPI programs that can use scanf, we need to branch on process rank, with process 0 reading in the data and then sending it to the other processes. For example, we might write the Get input function shown in Pro-gram 3.5 for our parallel trapezoidal rule program. In this function, process 0 simply reads in the values for a, b, and n and sends all three values to each process. This function uses the same basic communication structure as the “greetings” program, except that now process 0 is sending to each process, while the other processes are receiving.

To use this function, we can simply insert a call to it inside our main function, being careful to put it after we’ve initialized my_rank and comm_sz:

MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

Get data(my_rank, comm_sz, &a, &b, &n);

h = (b-a)/n;

. . .

void Get input(

int my rank /* in */,

int comm sz /* in */,

double a_p /* out */,

double b_p / * out */,

int n_p /* out */) {

int dest;

if (my_rank == 0) {

printf("Enter a, b, and n\n");

scanf("%lf %lf %d", a_p, b_p, n_p);

for (dest = 1; dest < comm sz; dest++) {

MPI_Send(a_p, 1, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD);

MPI_Send(b_p, 1, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD);

MPI_Send(n_p, 1, MPI_INT, dest, 0, MPI_COMM_WORLD);

}

} else { /* my rank != 0 */

MPI_Recv(a_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD,

MPI_STATUS_IGNORE);

MPI_Recv(b_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD,

MPI_STATUS_IGNORE);

MPI_Recv(n_p, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,

MPI_STATUS_IGNORE);

}

} / Get input /

Program 3.5: A function for reading user input

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

An Introduction to Parallel Programming : Distributed-Memory Programming with MPI : Dealing with I/O |