Vectorization
Vectorization is the software optimization of using single instruction
multiple data (SIMD) instructions to perform computation in parallel. Since the
instructions act on multiple items of data at the same time, this is really an
example of parallelism at the level of the instruction and data. As such, it
can result in a significant gain in perform-ance without the overhead associated
with synchronizing and managing multiple threads.
Listing 10.26 shows that the simplest example of this is the loop. This
loop adds two vectors and places the result into a third.
Listing 10.26 Loop
Amenable to SIMD Vectorization
void sum( double *in1, double *in2, double *out, int length )
{
for ( int i=0; i<length; i++ )
{
out[i] = in1[i] + in2[i];
}
}
Using normal instructions, referred to as single instruction single data (SISD) instruc-tions, each iteration
of the loop would require two loads: one addition and one store operation.
However, SIMD instructions act on a vector of values. A vector might hold two
double-precision values or four single-precision values, and so on.
For a SIMD instruction set that handles two
double-precision values in parallel, each iteration through the SIMD version of
the loop performs two loads of a pair of elements each, an addition operation
on that pair of elements, and a store that writes the resulting pair of
elements back to memory. Consequently, the trip count around the loop will have
reduced by a factor of two. Assuming the latencies of the operations remain the
same, the resulting code will run twice as fast.
Most modern compilers have the facility to identify
and utilize SIMD instructions, either by default or under a combination of
flags. The generated code will work only on hardware that supports the
instructions.
SIMD instructions complement other parallelization
strategies. If the loop shown in Listing 10.26 acted on large arrays of data, multiple
threads could handle the computa-tion efficiently. However, if the arrays were
small in size, the synchronization overheads may dwarf the gains from
parallelization. SIMD instructions would be effective in both situations,
perhaps providing a doubling of performance. However, they could potentially
provide that doubling of performance even when the length of the arrays is too
short for efficient parallelization.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.