Chapter: Advanced Computer Architecture : Instruction Level Parallelism

Quantitative Principles of Computer Design

The most important and pervasive principle of computer design is to make the common case fast

Quantitative Principles of Computer Design

The most important and pervasive principle of computer design is to make the common case fast

In applying this simple principle, we have to decide what the frequent case is and how much performance can be improved by making that case faster. A fundamental law, called Amdahl’s

Law, can be used to quantify this principle.

Amdahl’s Law

The performance gain that can be obtained by improving some portion of a computer can be calculated using Amdahl’s Law. Amdahl’s Law states that the performance improvement to

be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. Amdahl’s Law defines the speedup that can be gained by using a

particular feature.

Speedup is the Ratio

Speedup= Performance for entire task using enhancement when possible / Performance for entire task without using enhancement

Alternatively,

Execution Time for entire task without using enhancement Speedup=Execution Time for entire task using enhancement when possible

Speedup tells us how much faster a task will run using the machine with the enhancement as opposed to the original machine. Amdahl’s Law gives us a quick way to find the speedup

from some enhancement, which depends on two factors:

1. The fraction of the computation time in the original machine that can be converted to take advantage of the enhancement—For example, if 20 seconds of the execution time of a program that takes 60 seconds in total can use an enhancement, the fraction is 20/60. This value, which we will call Fractionenhanced, is always less than or equal to 1.

2. The improvement gained by the enhanced execution mode; that is, how much faster the

task would run if the enhanced mode were used for the entire program— This value is the time of the original mode over the time of the enhanced mode: If the enhanced mode takes 2 seconds for some portion of the program that can completely use the mode, while the original mode took 5 seconds for the same portion, the improvement is 5/2. We will call this value, which is always greater than 1, Speedupenhanced.

The execution time using the original machine with the enhanced mode will be the time spent using the unenhanced portion of the machine plus the time spent using the enhancement:

Amdahl’s Law can serve as a guide to how much an enhancement will improve performance and how to distribute resources to improve cost/performance.

The CPU Performance Equation

Essentially all computers are constructed using a clock running at a constant rate. These discrete time events are called ticks, clock ticks, clock periods, clocks, cycles, or clock cycles. Computer designers refer to the time of a clock period by its duration (e.g., 1 ns) or by its rate (e.g., 1 GHz). CPU time for a program can then be expressed

CPU Time = CPU Clock Cycles Per a Program X Clock Cycle Time

In addition to the number of clock cycles needed to execute a program, we can also count the number of instructions executed—the instruction path length or instruction count (IC). If we know the number of clock cycles and the instruction count we can calculate the average number of clock cycles per instruction (CPI).

CPI is computed as

CPI= CPU Clock Cycles Per a Program / Instruction Count

This allows us to use CPI in the execution time formula:

CPU time = Instruction count X Clock Cycle Time X Cycles per Instruction

Principle of Locality

Locality of reference means: Programs tend to reuse data and instructions they have used recently. A widely held rule of thumb is that a program spends 90% of its execution time in only 10% of the code. An implication of locality is that we can predict with reasonable accuracy what instructions and data a program will use in the near future based on its accesses in the recent past.

Locality of reference also applies to data accesses, though not as strongly as to code accesses. Two different types of locality have been observed. Temporal locality states that recently accessed items are likely to be accessed in the near future. Spatial locality says that items whose addresses are near one another tend to be referenced close together in time.

Advantage of Parallelism

Advantage of parallelism is one of the most important methods for improving performance. We give three brief examples, which are expounded on in later chapters. Our first example is the use of parallelism at the system level. To improve the throughput performance on a typical server benchmark, such as SPECWeb or TPC, multiple processors and multiple disks can be used. The workload of handling requests can then be spread among the CPUs or disks resulting in improved throughput. This is the reason that scalability is viewed as a valuable asset for server applications.

At the level of an individual processor, taking advantage of parallelism among instructions is critical to achieving high performance. This can be done to do this is through pipelining. The basic idea behind pipelining is to overlap the execution of instructions, so as to reduce the total time to complete a sequence of instructions. Viewed from the perspective of the CPU performance equation, we can think of pipelining as reducing the CPI by allowing instructions that take multiple cycles to overlap.

A key insight that allows pipelining to work is that not every instruction depends on its immediate predecessor, and thus, executing the instructions completely or partially in parallel may be possible.

Parallelism can also be exploited at the level of detailed digital design. For example set associative caches use multiple banks of memory that are typical searched in parallel to find a desired item. Modern ALUs use carry-lookahead, which uses parallelism to speed the process of computing sums from linear in the number of bits in the operands to logarithmic.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Advanced Computer Architecture : Instruction Level Parallelism : Quantitative Principles of Computer Design |