Collapsing
Loops to Improve Workload Balance
The parallel for
directive applies only to the next loop. As always, it is best to apply
parallelization at the outermost loop, because this reduces the number of
synchronizations necessary. However, a low trip count for the outer loop will
limit the maximum number of threads that can be used in parallel. In these
cases, it might be appropriate to parallelize the inner loop, since this could
have a higher iteration count. Without know-ing the trip counts for the two
loops, it is not possible to decide which strategy is more appropriate.
However, OpenMP provides a way of avoiding issues with the outermost
loop having a low trip count, which is to collapse the inner and outer loops
into a single loop. The clause to do this is collapse, which takes the number of loops to collapse as a parameter. Listing
7.61 shows an example of a code where the outer loop has a low trip count, and
using the collapse clause
enables scaling to higher numbers of threads.
Listing 7.61 Using
the collapse Clause to Improve Scaling
#include
<math.h>
void
main()
{
double array[2][10000];
#pragma omp
parallel for collapse( 2 )
for( int i=0; i<2; i++ )
for( int j=0; j<10000; j++ )
array[i][j] = sin( i+j );
}
Without the collapse clause, the outermost loop will only ever scale to two threads. With
the collapse clause,
the combined loop can be up to a theoretical 20,000 threads (although the
synchronization overheads would cause the code to run slowly far before that count
was reached). Using the collapse clause
may introduce additional overhead into the parallel region, so it is worth
evaluating whether the clause will improve per-formance or cause a performance
loss.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.