Home | | Multi - Core Architectures and Programming | Collapsing Loops to Improve Workload Balance

Chapter: Multicore Application Programming For Windows, Linux, and Oracle Solaris : Using Automatic Parallelization and OpenMP

Collapsing Loops to Improve Workload Balance

The parallel for directive applies only to the next loop. As always, it is best to apply parallelization at the outermost loop, because this reduces the number of synchronizations necessary.

Collapsing Loops to Improve Workload Balance

The parallel for directive applies only to the next loop. As always, it is best to apply parallelization at the outermost loop, because this reduces the number of synchronizations necessary. However, a low trip count for the outer loop will limit the maximum number of threads that can be used in parallel. In these cases, it might be appropriate to parallelize the inner loop, since this could have a higher iteration count. Without know-ing the trip counts for the two loops, it is not possible to decide which strategy is more appropriate.

 

However, OpenMP provides a way of avoiding issues with the outermost loop having a low trip count, which is to collapse the inner and outer loops into a single loop. The clause to do this is collapse, which takes the number of loops to collapse as a parameter. Listing 7.61 shows an example of a code where the outer loop has a low trip count, and using the collapse clause enables scaling to higher numbers of threads.

 

Listing 7.61   Using the collapse Clause to Improve Scaling

#include <math.h>

 

void main()

 

{

 

double array[2][10000];

 

#pragma omp parallel for collapse( 2 )

 

for( int i=0; i<2; i++ )

 

for( int j=0; j<10000; j++ )

 

array[i][j] = sin( i+j );

 

}

Without the collapse clause, the outermost loop will only ever scale to two threads. With the collapse clause, the combined loop can be up to a theoretical 20,000 threads (although the synchronization overheads would cause the code to run slowly far before that count was reached). Using the collapse clause may introduce additional overhead into the parallel region, so it is worth evaluating whether the clause will improve per-formance or cause a performance loss.


Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Multicore Application Programming For Windows, Linux, and Oracle Solaris : Using Automatic Parallelization and OpenMP : Collapsing Loops to Improve Workload Balance |


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.