HOW DO WE WRITE PARALLEL PROGRAMS?
There are a number of possible answers to this
question, but most of them depend on the basic idea of partitioning the work to be done among the cores. There are
two widely used approaches: task-parallelism and data-parallelism. In task-parallelism, we partition the various
tasks carried out in solving the problem among the cores. In data-parallelism,
we partition the data used in solving the problem among the cores, and each
core carries out more or less similar operations on its part of the data.
As an example, suppose that Prof P has to teach
a section of “Survey of English Literature.” Also suppose that Prof P has one
hundred students in her section, so she’s been assigned four teaching
assistants (TAs): Mr A, Ms B, Mr C, and Ms D. At last the semester is over, and
Prof P makes up a final exam that consists of five questions. In order to grade
the exam, she and her TAs might consider the following two options: each of
them can grade all one hundred responses to one of the questions; say P grades
question 1, A grades question 2, and so on. Alternatively, they can divide the
one hundred exams into five piles of twenty exams each, and each of them can
grade all the papers in one of the piles; P grades the papers in the first
pile, A grades the papers in the second pile, and so on.
In both approaches the “cores” are the professor
and her TAs. The first approach might be considered an example of
task-parallelism. There are five tasks to be carried out: grading the first
question, grading the second question, and so on. Presumably, the graders will
be looking for different information in question 1, which is about Shakespeare,
from the information in question 2, which is about Milton, and so on. So the
professor and her TAs will be “executing different instructions.”
On the other hand, the second approach might be
considered an example of data-parallelism. The “data” are the students’ papers,
which are divided among the cores, and each core applies more or less the same
grading instructions to each paper.
The first part of the global sum example in
Section 1.3 would probably be considered an example of data-parallelism. The
data are the values computed by Compute
next value, and each core carries out
roughly the same operations on its assigned elements: it computes the required
values by calling Compute next value and adds them together. The second part of the
first global sum example might be considered an example of task-parallelism.
There are two tasks: receiving and adding the cores’ partial sums, which is
carried out by the master core, and giving the partial sum to the master core,
which is carried out by the other cores.
When the cores can work independently, writing
a parallel program is much the same as writing a serial program. Things get a
good deal more complex when the cores need to coordinate their work. In the second
global sum example, although the tree structure in the diagram is very easy to
understand, writing the actual code is relatively complex. See Exercises 1.3
and 1.4. Unfortunately, it’s much more common for the cores to need
coordination.
In both global sum examples, the coordination
involves communication: one or more cores send their current partial
sums to another core. The global sum examples should also involve coordination
through load
balancing: even though we didn’t give
explicit formulas, it’s clear that we want the cores all to be assigned roughly
the same number of values to compute. If, for example, one core has to compute
most of the values, then the other cores will finish much sooner than the
heavily loaded core, and their computational power will be wasted.
A third type of coordination is synchronization. As an example, suppose that instead of
computing the values to be added, the values are read from stdin. Say x is an array that is read in by the master
core:
if (I’m the master core)
for (my_i = 0; my_i < n; my i++) scanf("%lf", &x[my_i]);
In most systems the cores are not automatically
synchronized. Rather, each core works at its own pace. In this case, the
problem is that we don’t want the other cores to race ahead and start computing
their partial sums before the master is done ini-tializing x and making it available to the other cores. That is, the cores
need to wait before starting execution of the code:
for (my_i = my_first i; my i < my_last i; my_i++)
my_sum += x[my_i];
We need to add in a point of synchronization
between the initialization of x and the computation of the partial sums:
Synchronize_cores();
The idea here is that each core will wait in
the function Synchronize cores until all the cores have entered the function—in particular, until
the master core has entered this function.
Currently, the most powerful parallel programs
are written using explicit parallel constructs, that is, they are written using extensions to
languages such as C and C++. These programs include explicit instructions for parallelism:
core 0 executes task 0, core 1 executes task 1, . . . , all cores synchronize,
. . . , and so on, so such programs are often extremely complex. Furthermore,
the complexity of modern cores often makes it necessary to use considerable
care in writing the code that will be executed by a single core.
There are other options for writing parallel
programs—for example, higher level languages—but they tend to sacrifice
performance in order to make program development somewhat easier.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.