Parallelizing the solvers using pthreads
Parallelizing the two n-body solvers using Pthreads is very similar to parallelizing them using OpenMP. The differences are only in implementation details, so rather than repeating the discussion, we will point out some of the principal differences between the Pthreads and the OpenMP implementations. We will also note some of .the more important similarities.
. By default local variables in Pthreads are private, so all shared variables are global in the Pthreads version.
The principal data structures in the Pthreads version are identical to those in the OpenMP version: vectors are two-dimensional arrays of doubles, and the mass, position, and velocity of a single particle are stored in a struct. The forces are stored in an array of vectors.
Startup for Pthreads is basically the same as the startup for OpenMP: the main thread gets the command-line arguments, and allocates and initializes the principal data structures.
The main difference between the Pthreads and the OpenMP implementations is in the details of parallelizing the inner loops. Since Pthreads has nothing analogous to a parallel for directive, we must explicitly determine which values of the loop variables correspond to each thread’s calculations. To facilitate this, we’ve .written a function Loop schedule, which determines
. the initial value of the loop variable,
. the final value of the loop variable, and the increment for the loop variable.
The input to the function is
. the calling thread’s rank,
. the number of threads,
. the total number of iterations, and
. . an argument indicating whether the partitioning should be block or cyclic.
Another difference between the Pthreads and the OpenMP versions has to do with barriers. Recall that the end of a parallel for directive in OpenMP has an implied barrier. As we’ve seen, this is important. For example, we don’t want a thread to start updating its positions until all the forces have been calculated, because it could use an out-of-date force and another thread could use an out-of-date position. If we simply partition the loop iterations among the threads in the Pthreads version, there won’t be a barrier at the end of an inner for loop and we’ll have a race condition. Thus, we need to add explicit barriers after the inner loops when a race condition can arise. The Pthreads standard includes a barrier. However, some systems don’t implement it, so we’ve defined a function that uses a Pthreads condition variable to implement a barrier. See Subsection 4.8.3 for details.