Chapter: An Introduction to Parallel Programming : Parallel Program Development

A static parallelization of tree search using pthreads

In our static parallelization, a single thread uses breadth-first search to generate enough partial tours so that each thread gets at least one partial tour.

A static parallelization of tree search using pthreads

In our static parallelization, a single thread uses breadth-first search to generate enough partial tours so that each thread gets at least one partial tour. Then each thread takes its partial tours and runs iterative tree search on them. We can use the pseudocode shown in Program 6.7 on each thread. Note that most of the function calls—for example, Best_tour, Feasible, Add city—need to access the adjacency matrix representing the digraph, so all the threads will need to access the digraph. However, since these are only read accesses, this won’t result in a race condition or contention among the threads.

There are only four potential differences between this pseudocode and the pseudocode we used for the second iterative serial implementation:

. The use of my stack instead of stack; since each thread has its own, private stack,

. we use my stack as the identifier for the stack object instead of stack.

. Initialization of the stack.

. Implementation of the Best tour function. Implementation of the Update best tour function.

In the serial implementation, the stack is initialized by pushing the partial tour con-sisting only of the hometown onto the stack. In the parallel version we need to generate at least thread count partial tours to distribute among the threads. As we discussed earlier, we can use breadth-first search to generate a list of at least thread count tours by having a single thread search the tree until it reaches a level with at least thread count tours. (Note that this implies that the number of threads should be less than (n-1)! , which shouldn’t be a problem). Then the threads can

use a block partition to divide these tours among themselves and push them onto their private stacks. Exercise 6.18 looks into the details.

To implement the Best tour function, a thread should compare the cost of its current tour with the cost of the global best tour. Since multiple threads may be simul-taneously accessing the global best cost, it might at first seem that there will be a race condition. However, the Best tour function only reads the global best cost, so there won’t be any conflict with threads that are also checking the best cost. If a thread is updating the global best cost, then a thread that is just checking it will either read the old value or the new, updated value. While we would prefer that it get the new value, we can’t insure this without using some very costly locking strategy. For example, threads wanting to execute Best_tour or Update_best_tour could wait on a sin-gle mutex. This would insure that no thread is updating while another thread is only checking, but would have the unfortunate side effect that only one thread could check the best cost at a time. We could improve on this by using a read-write lock, but this would have the side effect that the readers—the threads calling Best_tour—would all block while a thread updated the best tour. In principle, this doesn’t sound too bad, but recall that in practice read-write locks can be quite slow. So it seems pretty clear that the “no contention” solution of possibly getting a best tour cost that’s out-of-date is probably better, as the next time the thread calls Best_tour, it will get the updated value of the best tour cost.

On the other hand, we call Update_best_tour with the intention of writing to the best tour structure, and this clearly can cause a race condition if two threads call it simultaneously. To avoid this problem, we can protect the body of the Update_best_tour function with a mutex. This isn’t enough, however; between the time a thread completes the test in Best tour and the time it obtains the lock in Update_best_tour, another thread may have obtained the lock and updated the best tour cost, which now may be less than the best tour cost that the first thread found in Best_tour. Thus, correct pseudocode for Update_best_tour should look something like this:

Update best tour function with a mutex. This isn’t enough, however; between the time a thread completes the test in Best tour and the time it obtains the lock in Update best tour, another thread may have obtained the lock and updated the best tour cost, which now may be less than the best tour cost that the first thread found in Best tour. Thus, correct pseudocode for Update best tour should look something like this:

This may seem wasteful, but if updates to the best tour are infrequent, then most of the time Best tour will return false and it will only be rarely necessary to make the “double” call.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

An Introduction to Parallel Programming : Parallel Program Development : A static parallelization of tree search using pthreads |