Hardware
Support for Exposing More Parallelism at Compiler Time
Techniques
such as loop unrolling, software pipelining, and trace scheduling can be used
to increase the amount of parallelism available when the behavior of branches
is fairly predictable at compile time. When the behavior of branches is not
well known, compiler techniques alone may not be able to uncover much ILP. In
such cases, the control dependences may severely limit the amount of
parallelism that can be exploited. Similarly, potential dependences between
memory reference instructions could prevent code movement that would increase
available ILP. This section introduces several techniques that can help
overcome such limitations.
The first
is an extension of the instruction set to include conditional or predicated
instructions. Such instructions can be used to eliminate branches converting a
control dependence into a data dependence and potentially improving
performance.
Hardware
speculation with in-order commit preserved exception behavior by detecting and
raising exceptions only at commit time when the instruction was no longer
speculative. To enhance the ability of the compiler to speculatively move code
over branches, while still preserving the exception behavior, we consider
several different methods, which either include explicit checks for exceptions
or techniques to ensure that only those exceptions that should arise are
generated.
Finally,
the hardware speculation schemes provided support for reordering loads and
stores, by checking for potential address conflicts at runtime. To allow the
compiler to reorder loads and stores when it suspects they do not conflict, but
cannot be absolutely certain, a mechanism for checking for such conflicts can
be added to the hardware. This mechanism permits additional opportunities for
memory reference speculation.
1. Conditional or Predicated
Instructions
The
concept behind conditional instructions is quite simple: An instruction refers
to a condition, which is evaluated as part of the instruction execution. If the
condition is true, the instruction is executed normally; if the condition is
false, the execution continues as if the instruction was a no-op. The most
common example of such an instruction is conditional move, which moves a value
from one register to another if the condition is true. Such an instruction can
be used to completely eliminate a branch in simple code sequences.
Consider
the following code:
if (A==0)
{S=T;}
Assuming
that registers R1, R2, and R3 hold the values of A, S, and T, respectively,
The
straightforward code using a branch for this statement is
BNEZ R1,
L ADDU R2, R3, R0
L: Using
a conditional move that performs the move only if the third operand is equal to
zero, we can implement this statement in one instruction:
CMOVZ R2,
R3, R1
The
conditional instruction allows us to convert the control dependence present in
the branch-based code sequence to data dependence. For a pipelined processor,
this moves the place where the dependence must be resolved from near the front
of the pipeline, where it is resolved for branches, to the end of the pipeline
where the register write occurs.
One
obvious use for conditional move is to implement the absolute value function: A
= abs (B), which is implemented as if (B<0) {A = - B;) else {A=B;}. This if
statement can be implemented as a pair of conditional moves, or as one
unconditional move (A=B) and one conditional move (A= - B).
In the
example above or in the compilation of absolute value, conditional moves are
used to change a control dependence into a data dependence. This enables us to
eliminate the branch and possibly improve the pipeline behavior.
Conditional
moves are the simplest form of conditional or predicated instructions, and
although useful for short sequences, have limitations. In particular, using
conditional move to eliminate branches that guard the execution of large blocks
of code can be in efficient, since many conditional moves may need to be
introduced.
To remedy
the in efficiency of using conditional moves, some architectures support full
predication, whereby the execution of all instructions is controlled by a
predicate. When the predicate is false, the instruction becomes a no-op. Full
predication allows us to simply convert large blocks of code that are branch
dependent. For example, an if-then-else statement within a loop can be entirely
converted to predicated execution, so that the code in the then-case executes
only if the value of the condition is true, and the code in the else-case
executes only if the value of the condition is false. Predication is
particularly valuable with global code scheduling, since it can eliminate
nonloop branches, which significantly complicate instruction scheduling.
Predicated
instructions can also be used to speculatively move an instruction that is
time-critical, but may cause an exception if moved before a guarding branch.
Although it is possible to do this with conditional move, it is more costly.
Predicated
or conditional instructions are extremely useful for implementing short
alternative control flows, for eliminating some unpredictable branches, and for
reducing the overhead of global code scheduling. Nonetheless, the usefulness of
conditional instructions is limited by several factors:
Ø Predicated
instructions that are annulled (i.e., whose conditions are false) still take
some processor resources. An annulled predicated instruction requires fetch
resources at a minimum, and in most processors functional unit execution time.
Ø Predicated
instructions are most useful when the predicate can be evaluated early. If the
condition evaluation and predicated instructions cannot be separated (because
of data dependences in determining the condition), then a conditional
instruction may result in a stall for a data hazard. With branch prediction and
speculation, such stalls can be avoided, at least when the branches are
predicted accurately.
Ø The use
of conditional instructions can be limited when the control flow involves more
than a simple alternative sequence. For example, moving an instruction across
multiple branches requires making it conditional on both branches, which
requires two conditions to be specified or requires additional instructions to
compute the controlling predicate.
Ø Conditional
instructions may have some speed penalty compared with unconditional
instructions. This may show up as a higher cycle count for such instructions or
a slower clock rate overall. If conditional instructions are more expensive,
they will need to be used judiciously
For these
reasons, many architectures have included a few simple conditional instructions
(with conditional move being the most frequent), but only a few architectures
include conditional versions for the majority of the instructions. The MIPS,
Alpha, Power-PC, SPARC and Intel x86 (as defined in the Pentium processor) all
support conditional move. The IA-64 architecture supports full predication for
all instructions.
2. Compiler Speculation with
Hardware Support
Many
programs have branches that can be accurately predicted at compile time either
from the program structure or by using a profile. In such cases, the compiler
may want to speculate either to improve the scheduling or to increase the issue
rate. Predicated instructions provide one method to speculate, but they are
really more useful when control dependences can be completely eliminated by
if-conversion. In many cases, we would like to move speculated instructions not
only before branch, but before the condition evaluation, and predication cannot
achieve this.
As
pointed out earlier, to speculate ambitiously requires three capabilities:
1. The
ability of the compiler to find instructions that, with the possible use of
register renaming, can be speculatively moved and not affect the program data
flow,
2. The
ability to ignore exceptions in speculated instructions, until we know that
such exceptions should really occur, and
3. The
ability to speculatively interchange loads and stores, or stores and stores,
which may have address conflicts.
The first
of these is a compiler capability, while the last two require hardware support.
3. Hardware Support for
Preserving Exception Behavior
There are
four methods that have been investigated for supporting more ambitious
speculation without introducing erroneous exception behavior:
1 The
hardware and operating system cooperatively ignore exceptions for speculative
instructions.
2
Speculative instructions that never raise exceptions are used, and checks are
introduced to determine when an exception should occur.
3 A set
of status bits, called poison bits, are attached to the result registers
written by speculated instructions when the instructions cause exceptions. The
poison bits cause a fault when a normal instruction attempts to use the register.
4 A
mechanism is provided to indicate that an instruction is speculative and the
hardware buffers the instruction result until it is certain that the
instruction is no longer speculative.
To
explain these schemes, we need to distinguish between exceptions that indicate
a program error and would normally cause termination, such as a memory
protection violation, and those that are handled and normally resumed, such as
a page fault. Exceptions that can be resumed can be accepted and processed for
speculative instructions just as if they were normal instructions.
If the
speculative instruction should not have been executed, handling the unneeded
exception may have some negative performance effects, but it cannot cause
incorrect execution. The cost of these exceptions may be high, however, and
some processors use hardware support to avoid taking such exceptions, just as
processors with hardware speculation may take some exceptions in speculative
mode, while avoiding others until an instruction is known not to be
speculative.
Exceptions
that indicate a program error should not occur in correct programs, and the
result of a program that gets such an exception is not well defined, except
perhaps when the program is running in a debugging mode. If such exceptions
arise in speculated instructions, we cannot take the exception until we know
that the instruction is no longer speculative.
In the
simplest method for preserving exceptions, the hardware and the operating
system simply handle all resumable exceptions when the exception occurs and
simply return an undefined value for any exception that would cause
termination.
A second
approach to preserving exception behavior when speculating introduces
speculative versions of instructions that do not generate terminating
exceptions.
A third
approach for preserving exception behavior tracks exceptions as they occur but
postpones any terminating exception until a value is actually used, preserving
the occurrence of the exception, although not in a completely precise fashion.
The
fourth and final approach listed above relies on a hardware mechanism that
operates like a reorder buffer. In such an approach, instructions are marked by
the compiler as speculative and include an indicator of how many branches the
instruction was speculatively moved across and what branch action (taken/not
taken) the compiler assumed.
All
instructions are placed in a reorder buffer when issued and are forced to
commit in
order, as
in a hardware speculation approach. The reorder buffer tracks when instructions
are ready to commit and delays the “write back” portion of any speculative
instruction. Speculative
instructions
are not allowed to commit until the branches they have been speculatively moved
over are also ready to commit, or, alternatively, until the corresponding
sentinel is reached.
4. Hardware Support for Memory
Reference Speculation
Moving
loads across stores is usually done when the compiler is certain the addresses
do not conflict. To allow the compiler to undertake such code motion, when it
cannot be absolutely certain that such a movement is correct, a special
instruction to check for address conflicts can be included in the architecture.
The special instruction is left at the original location of the load
instruction (and acts like a guardian) and the load is moved up across one or
more stores.
When a
speculated load is executed, the hardware saves the address of the accessed
memory location. If a subsequent store changes the location before the check
instruction, then the speculation has failed. If the location has not been
touched then the speculation is successful. Speculation failure can be handled
in two ways.
If only
the load instruction was speculated, then it suffices to redo the load at the
point of the check instruction Ifadditional instructions that depended on the
load were also speculated, then a fix-up sequence that re-executes all the
speculated instructions starting with the load is needed.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.