Identifying
Parallelization Opportunities
The steps necessary to identify parallelization opportunities in codes
are as follows:
1. Gather a representative runtime profile of the application, and identify
the regions of code where the most time is currently being spent.
2. For these regions, examine the code for dependencies, and determine
whether the dependencies can be broken so that the code can be performed either
as multiple
parallel tasks or as a loop over multiple parallel iterations. At this
point, it may also be worth investigating whether a different algorithm or
approach would give code that could be more easily made parallel.
3. Estimate the overheads and likely performance gains from this
parallelization strat-egy. If the approach promises close to linear scaling
with the number of threads, then it is probably a good approach; if the scaling
does not look very efficient, it may be worth broadening the scope of the
analysis.
4. Broaden the scope of the analysis by considering the routine that calls
the region of interest. Is it possible to make this routine parallel?
The important point to remember is that
parallelization incurs synchronization costs, so the more work that each thread
performs before it needs synchronization, the better the code will scale.
Consequently, it is always worth looking further up the call stack of a region
of code to determine whether there is a more effective parallelization point.
For example, consider the pseudocode shown in Listing 3.14.
Listing 3.14 Opportunities
for Parallelization at Different Granularities
void
handlePacket(packet_t *packet)
{
doOneTask(packet);
doSecondTask(packet);
}
void
handleStream( stream_t* stream )
{
for( int i=0; i < stream->number_of_packets; i++)
{
handlePacket( stream->packets[i] );
}
}
In this example, there are two long-running tasks;
each performs some manipulation of a packet of data. It is quite possible that
the two tasks, doOneTask() and doSecondTask(), could be performed in parallel. However, that would introduce one
synchronization point after every packet that is processed. So, the
synchronization cost would be O(N) where N is the number of packets.
Looking further up the stack, the calling routine, handleStream(), iterates over a stream of packets. So, it would probably be more appropriate
to explore whether this loop could be made to run in parallel. If this was
successful, then there would be a syn-chronization point only after an entire
stream of packets had been handled, which could represent a significant
reduction in the total synchronization costs.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.