# How Do We Write Parallel Programs?

There are a number of possible answers to this question, but most of them depend on the basic idea of partitioning the work to be done among the cores. There are two widely used approaches: task-parallelism and data-parallelism.

HOW DO WE WRITE PARALLEL PROGRAMS?

There are a number of possible answers to this question, but most of them depend on the basic idea of partitioning the work to be done among the cores. There are two widely used approaches: task-parallelism and data-parallelism. In task-parallelism, we partition the various tasks carried out in solving the problem among the cores. In data-parallelism, we partition the data used in solving the problem among the cores, and each core carries out more or less similar operations on its part of the data.

As an example, suppose that Prof P has to teach a section of “Survey of English Literature.” Also suppose that Prof P has one hundred students in her section, so she’s been assigned four teaching assistants (TAs): Mr A, Ms B, Mr C, and Ms D. At last the semester is over, and Prof P makes up a final exam that consists of five questions. In order to grade the exam, she and her TAs might consider the following two options: each of them can grade all one hundred responses to one of the questions; say P grades question 1, A grades question 2, and so on. Alternatively, they can divide the one hundred exams into five piles of twenty exams each, and each of them can grade all the papers in one of the piles; P grades the papers in the first pile, A grades the papers in the second pile, and so on.

In both approaches the “cores” are the professor and her TAs. The first approach might be considered an example of task-parallelism. There are five tasks to be carried out: grading the first question, grading the second question, and so on. Presumably, the graders will be looking for different information in question 1, which is about Shakespeare, from the information in question 2, which is about Milton, and so on. So the professor and her TAs will be “executing different instructions.”

On the other hand, the second approach might be considered an example of data-parallelism. The “data” are the students’ papers, which are divided among the cores, and each core applies more or less the same grading instructions to each paper.

The first part of the global sum example in Section 1.3 would probably be considered an example of data-parallelism. The data are the values computed by Compute next value, and each core carries out roughly the same operations on its assigned elements: it computes the required values by calling Compute next value and adds them together. The second part of the first global sum example might be considered an example of task-parallelism. There are two tasks: receiving and adding the cores’ partial sums, which is carried out by the master core, and giving the partial sum to the master core, which is carried out by the other cores.

When the cores can work independently, writing a parallel program is much the same as writing a serial program. Things get a good deal more complex when the cores need to coordinate their work. In the second global sum example, although the tree structure in the diagram is very easy to understand, writing the actual code is relatively complex. See Exercises 1.3 and 1.4. Unfortunately, it’s much more common for the cores to need coordination.

In both global sum examples, the coordination involves communication: one or more cores send their current partial sums to another core. The global sum examples should also involve coordination through load balancing: even though we didn’t give explicit formulas, it’s clear that we want the cores all to be assigned roughly the same number of values to compute. If, for example, one core has to compute most of the values, then the other cores will finish much sooner than the heavily loaded core, and their computational power will be wasted.

A third type of coordination is synchronization. As an example, suppose that instead of computing the values to be added, the values are read from stdin. Say x is an array that is read in by the master core:

if (I’m the master core)

for (my_i = 0; my_i < n; my i++) scanf("%lf", &x[my_i]);

In most systems the cores are not automatically synchronized. Rather, each core works at its own pace. In this case, the problem is that we don’t want the other cores to race ahead and start computing their partial sums before the master is done ini-tializing x and making it available to the other cores. That is, the cores need to wait before starting execution of the code:

for (my_i = my_first i; my i < my_last i; my_i++)

my_sum += x[my_i];

We need to add in a point of synchronization between the initialization of x and the computation of the partial sums:

Synchronize_cores();

The idea here is that each core will wait in the function Synchronize cores until all the cores have entered the function—in particular, until the master core has entered this function.

Currently, the most powerful parallel programs are written using explicit parallel constructs, that is, they are written using extensions to languages such as C and C++. These programs include explicit instructions for parallelism: core 0 executes task 0, core 1 executes task 1, . . . , all cores synchronize, . . . , and so on, so such programs are often extremely complex. Furthermore, the complexity of modern cores often makes it necessary to use considerable care in writing the code that will be executed by a single core.

There are other options for writing parallel programs—for example, higher level languages—but they tend to sacrifice performance in order to make program development somewhat easier.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Related Topics