Chapter: Advanced Computer Architecture : Multiple Issue Processors

Limitations of ILP

1. The Hardware Model 2. Limitations on the Window Size and Maximum Issue Count 3.The Effects of Realistic Branch and Jump Prediction: 4. Limitations on ILP for Realizable Processors

Limitations of ILP

1. The Hardware Model

An ideal processor is one where all artificial constraints on ILP are removed. The only limits on ILP in such a processor are those imposed by the actual data flows either through registers or memory.

The assumptions made for an ideal or perfect processor are as follows:

1. Register renaming—There are an infinite number of virtual registers available and hence all WAW and WAR hazards are avoided and an unbounded number of instructions can begin execution simultaneously.

2. Branch prediction—Branch prediction is perfect. All conditional branches are predicted exactly.

3. Jump prediction—All jumps (including jump register used for return and computed jumps) are perfectly predicted. When combined with perfect branch prediction, this is equivalent to having a processor with perfect speculation and an unbounded buffer of instructions available for execution.

4. Memory-address alias analysis—All memory addresses are known exactly and a load

can be moved before a store provided that the addresses are not identical.

Assumptions 2 and 3 eliminate all control dependences. Likewise, assumptions 1 and 4 eliminate all but the true data dependences. Together, these four assumptions mean that any instruction in the of the program’s execution can be scheduled on the cycle immediately following the execution of the predecessor on which it depends.

2. Limitations on the Window Size and Maximum Issue Count

A dynamic processor might be able to more closely match the amount of parallelism uncovered by our ideal processor.

consider what the perfect processor must do:

1. Look arbitrarily far ahead to find a set of instructions to issue, predicting all branches perfectly.

2. Rename all register uses to avoid WAR and WAW hazards.

3. Determine whether there are any data dependencies among the instructions in the issue packet; if so, rename accordingly.

4. Determine if any memory dependences exist among the issuing instructions and handle them appropriately.

5. Provide enough replicated functional units to allow all the ready instructions to issue.

Obviously, this analysis is quite complicated. For example, to determine whether n issuing instructions have any register dependences among them, assuming all instructions are register-register and the total number of registers is unbounded, requires

2n-2+2n-4+……..+2 = 2∑i=1 ^n-1 i = [2 (n-1)n]/2 = n² -n

Comparisons. Thus, to detect dependences among the next 2000 instructions—the default size we assume in several figures—requires almost four million comparisons! Even issuing only 50 instructions requires 2450 comparisons. This cost obviously limits the number of instructions that can be considered for issue at once.

3.The Effects of Realistic Branch and Jump Prediction:

Our ideal processor assumes that branches can be perfectly predicted: The outcome of any branch in the program is known before the first instruction is executed.

The five levels of branch prediction shown in these figures are

1. Perfect—All branches and jumps are perfectly predicted at the start of execution.

2. Tournament-based branch predictor—The prediction scheme uses a correlating two-bit predictor and a noncorrelating two-bit predictor together with a selector, which chooses the best predictor for each branch.

3. Standard two-bit predictor with 512 two-bit entries—In addition, we assume a 16-entry buffer to predict returns.

4. Static—A static predictor uses the profile history of the program and predicts that the branch is always taken or always not taken based on the profile.

5. None—No branch prediction is used, though jumps are still predicted. Parallelism is largely limited to within a basic block.

4. Limitations on ILP for Realizable Processors

The performance of processors an ambitious level of hardware support equal to or better than what is likely in the next five years. In particular we assume the following fixed attributes:

1. Up to 64 instruction issues per clock with no issue restrictions. As we discuss later, the practical implications of very wide issue widths on clock rate, logic complexity, and power may be the most important limitation on exploiting ILP.

2. A tournament predictor with 1K entries and a 16-entry return predictor. This predictor is fairly comparable to the best predictors in 2000; the predictor is not a primary bottleneck.

3. Perfect disambiguation of memory references done dynamically—this is ambitious but perhaps attainable for small window sizes (and hence small issue rates and load/store buffers) or through a memory dependence predictor.

4. Register renaming with 64 additional integer and 64 additional FP registers,exceeding largest number available on any processor in 2001 (41 and 41 in the Alpha 21264), but probably easily reachable within two or three years.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Advanced Computer Architecture : Multiple Issue Processors : Limitations of ILP |