Vectorization is the software optimization of using single instruction multiple data (SIMD) instructions to perform computation in parallel. Since the instructions act on multiple items of data at the same time, this is really an example of parallelism at the level of the instruction and data. As such, it can result in a significant gain in perform-ance without the overhead associated with synchronizing and managing multiple threads.
Listing 10.26 shows that the simplest example of this is the loop. This loop adds two vectors and places the result into a third.
Listing 10.26 Loop Amenable to SIMD Vectorization
void sum( double *in1, double *in2, double *out, int length )
for ( int i=0; i<length; i++ )
out[i] = in1[i] + in2[i];
Using normal instructions, referred to as single instruction single data (SISD) instruc-tions, each iteration of the loop would require two loads: one addition and one store operation. However, SIMD instructions act on a vector of values. A vector might hold two double-precision values or four single-precision values, and so on.
For a SIMD instruction set that handles two double-precision values in parallel, each iteration through the SIMD version of the loop performs two loads of a pair of elements each, an addition operation on that pair of elements, and a store that writes the resulting pair of elements back to memory. Consequently, the trip count around the loop will have reduced by a factor of two. Assuming the latencies of the operations remain the same, the resulting code will run twice as fast.
Most modern compilers have the facility to identify and utilize SIMD instructions, either by default or under a combination of flags. The generated code will work only on hardware that supports the instructions.
SIMD instructions complement other parallelization strategies. If the loop shown in Listing 10.26 acted on large arrays of data, multiple threads could handle the computa-tion efficiently. However, if the arrays were small in size, the synchronization overheads may dwarf the gains from parallelization. SIMD instructions would be effective in both situations, perhaps providing a doubling of performance. However, they could potentially provide that doubling of performance even when the length of the arrays is too short for efficient parallelization.