1 Contiguous Evaluation
2 The Dynamic Programming Algorithm
3 Exercises for Section 8.11
Algorithm 8.26 in
Section 8.10 produces optimal code from an expression tree using an amount of time
that is a linear function of the size of the tree. This procedure works for
machines in which all computation is done in registers and in which
instructions consist of an operator applied to two registers or to a register
and a memory location.
An algorithm based on the
principle of dynamic programming can be used to extend the class of machines
for which optimal code can be generated from expression trees in linear time.
The dynamic programming algorithm applies to a broad class of register machines
with complex instruction sets.
The dynamic programming algorithm can be used to generate code for any
machine with r interchangeable
registers R0,R1, .. . , Rr—1 and load, store, and add instructions. For
simplicity, we assume every instruction costs one unit, although the dynamic
programming algorithm can easily be modified to work even if each instruction
has its own cost.
1. Contiguous Evaluation
The dynamic programming algorithm
partitions the problem of generating op-timal code for an expression into the
subproblems of generating optimal code for the subexpressions of the given
expression. As a simple example, consider an expression E of the form E1 + E2. An optimal program for E is
formed by combining optimal programs for E1
and E2, in one or the other order, followed by code to evaluate the operator +.
The subproblems of generating optimal code for E1 and E2 are solved similarly.
An optimal program produced by the dynamic
programming algorithm has an important property. It evaluates an expression E = E1 op E2
"contigu-ously." We can appreciate what this means by looking at the
syntax tree T for E:
Here Ti and T2 are trees for E1
and E2, respectively.
We say a program P evaluates a tree T
contiguously if it first evaluates those subtrees of T that need to be computed into memory. Then, it evaluates the
remainder of T either in the order
Ti, T2, and then the root, or in the order T2, Ti, and then the root, in either case using the previously computed
values from memory whenever
necessary. As an example of noncontiguous evaluation, P might first evaluate part of Ti leaving the value in a register
(instead of memory), next evaluate T2, and then return to evaluate the rest of Ti.
For the register machine in this section, we can
prove that given any mach-ine-language program P to evaluate an expression tree T, we can find an equiv-alent
program P' such that
1. P' is
of no higher cost than P,
P' uses no more registers than P,
P' evaluates the tree contiguously.
This result implies that every expression tree can
be evaluated optimally by a contiguous program.
By way of contrast, machines with even-odd register pairs do not always
have optimal contiguous evaluations; the x86 architecture uses register pairs
for mul-tiplication and division. For such machines, we can give examples of
expression trees in which an optimal machine language program must first
evaluate into a register a portion of the left subtree of the root, then a
portion of the right subtree, then another part of the left subtree, then
another part of the right, and so on. This type of oscillation is unnecessary
for an optimal evaluation of any expression tree using the machine in this
The contiguous evaluation property defined above
ensures that for any ex-pression tree T there always exists an optimal program
that consists of optimal programs for subtrees of the root, followed by an
instruction to evaluate the root. This property allows us to use a dynamic
programming algorithm to generate an optimal program for T.
2. The Dynamic Programming
The dynamic programming algorithm proceeds in three
phases (suppose the target machine has r
1. Compute bottom-up for each node n of the expression tree T an array C
of costs, in which the zth component C[i] is the optimal cost of computing the
subtree S rooted at n into a register, assuming i registers are available for
the computation, for 1 < i < r.
2. Traverse T, using the cost vectors to determine which subtrees of T
must be computed into memory.
Traverse each tree using the cost vectors and associated instructions to
generate the final target code. The code for the subtrees computed into memory
locations is generated first.
Each of these phases can be implemented to run in
time linearly proportional to the size of the expression tree.
The cost of computing a node n includes whatever loads and stores are
necessary to evaluate S in the given
number of registers. It also includes the cost of computing the operator at the
root of S. The zeroth component of
the cost vector is the optimal cost of computing the subtree S into memory. The contiguous evaluation
property ensures that an optimal program for S can be generated by considering combinations of optimal programs
only for the subtrees of the root of S.
This restriction reduces the number of cases that need to be considered.
In order to compute the costs C[i] at node n, we view the instructions
as tree-rewriting rules, as in Section 8.9.
Consider each template E that matches
the input tree at node n. By examining the cost vectors at the corresponding
descendants of n, determine the costs of evaluating the operands at the leaves
of E. For those operands of E that are registers, consider all
possible orders in which the corresponding subtrees of T can be evaluated into registers. In each ordering, the first
subtree corresponding to a register operand can be evaluated using i available registers, the second using i -1 registers, and so on. To account
for node n, add in the cost of the instruction associated with the template E. The value C[i] is then the minimum cost over all possible orders.
The cost vectors for the entire tree T
can be computed bottom up in time linearly proportional to the number of nodes
in T. It is convenient to store at
each node the instruction used to achieve the best cost for C[i] for each value of i. The smallest cost
in the vector for the root of T gives
the minimum cost of evaluating T.
Example 8.28: Consider a machine having two registers RO and Rl, and the
following instructions, each of unit cost:
LD Ri, Kj
// Ri = Mj
op Ri, Ri,
Ri // Ri =
Ri Op Rj
op Ri, Ri,
Mi // Ri =
Ri Op Kj
LD Ri, Ri
// Ri = Ri
ST Hi, Ri
// Mi = Rj
In these instructions, Ri is
either RO or Rl, and Mi is a memory location. The operator op
corresponds to an arithmetic operators.
Let us apply the dynamic programming algorithm to generate optimal code
for the syntax tree in Fig 8.26. In the first phase, we compute the cost
vectors shown at each node. To illustrate this cost computation, consider the
cost vector at the leaf a. C, the cost of computing a into
memory, is 0 since it is already there. C[l],
the cost of computing a into a register, is 1 since we can load it into a register with the
RO, a. C, the cost of loading a into a
register with two registers available, is the same as that with one register
available. The cost vector at leaf a is therefore (0,1,1).
Consider the cost vector at the root. We first
determine the minimum cost of computing the root with one and two registers
available. The machine instruction ADD
RO, RO, M matches the root, because the
root is labeled with the operator +. Using this instruction, the minimum cost
of evaluating the root with one register available is the minimum cost of
computing its right subtree into memory, plus the minimum cost of computing its
left subtree into the register, plus 1 for the instruction. No other way
exists. The cost vectors at the right and left children of the root show that
the minimum cost of computing the root with one register available is 5 + 2 + 1
Now consider the minimum cost of evaluating the root with two registers
available. Three cases arise depending on which instruction is used to compute
the root and in what order the left and right subtrees of the root are
Compute the left subtree with two
registers available into register RO, compute the right subtree with one register available into register Rl, and use
the instruction ADD
RO, RO, Rl to compute the root. This
sequence has cost 2 + 5 + 1 = 8.
Compute the right subtree with
two registers available into Rl, compute
the left subtree with one
register available into RO, and use the instruction ADD
RO, RO, Rl. This sequence has cost 4 + 2 + 1
Compute the right subtree into
memory location M, compute the left sub-tree with two registers available into register RO, and use
the instruction ADD
RO, RO, M. This sequence has cost 5 + 2 + 1
The second choice gives the minimum cost 7.
The minimum cost of computing the
root into memory is determined by adding one to the minimum cost of computing
the root with all registers avail-able; that is, we compute the root into a
register and then store the result. The cost vector at the root is therefore
From the cost vectors we can easily construct the code sequence by
making a traversal of the tree. From the tree in Fig. 8.26, assuming two
registers are available, an optimal code sequence is
LD RO, c //
LD Rl, d //
DIV Rl, Rl,
e // Rl =
Rl / e
MUL RO, RO,
Rl // RO =
RO * Rl
LD Rl, a //
SUB Rl, Rl,
b // Rl = Rl - b
ADD Rl, Rl,
RO // Rl =
Rl + RO
Dynamic programming techniques
have been used in a number of compilers, including the second version of the
portable C compiler, PCC2 . The technique facilitates retargeting because of
the applicability of the dynamic programming technique to a broad class of
3. Exercises for Section 8.11
Exercise 8 . 1 1 . 1 : Augment
the tree-rewriting scheme in Fig. 8.20 with costs, and use dynamic programming
and tree matching to generate code for the statements in Exercise 8.9.1.
Exercise 8.11.2 : How would you extend dynamic programming to do optimal
code generation on dags?
Summary of Chapter 8
Code generation is the final phase of a compiler. The code generator maps the
intermediate representation produced by the front end, or if there is a code
optimization phase by the code optimizer, into the target program.
• Instruction selection
is the process
of choosing target-language instructions for each IR statement.
• Register allocation
is the process
of deciding which IR
values to keep in registers. Graph coloring is an effective technique for
doing register allocation in compilers.
• Register assignment
is the process
of deciding which register should hold a given
A retargetable compiler is one
that can generate code for multiple instruc-tion sets.
A virtual machine is an interpreter for a
bytecode intermediate language produced by languages such as Java and C # .
A CISC machine is typically a
two-address machine with relatively few registers, several register classes, and
variable-length instructions with complex addressing modes.
A RISC machine is typically a
three-address machine with many registers in which operations are done in
A basic block is a maximal
sequence of consecutive three-address state-ments in which flow of control can
only enter at the first statement of the block and leave at the last statement
without halting or branching except possibly at the last statement in the basic
A flow graph is a graphical
representation of a program in which the nodes of the graph are basic blocks
and the edges of the graph show how control can flow among the blocks.
A loop in a flow graph is a
strongly connected region with a single entry point called the loop header.
A DAG representation of a basic
block is a directed acyclic graph in which the nodes of the DAG represent the
statements within the block and each child of a node corresponds to the
statement that is the last definition of an operand used in the statement.
• Peephole optimizations are local code-improving transformations that
can be applied to a program, usually through a sliding window.
* Instruction selection
can be done
by a tree-rewriting process
in which tree patterns
corresponding to machine instructions are used to tile a syntax tree. We can
associate costs with the tree-rewriting rules and apply dynamic programming to
obtain an optimal tiling for useful classes of machines and expressions.
An Ershov number tells how many registers are needed to evaluate an expression
without storing any temporaries.
• Spill code is an instruction sequence that stores a value in a
register into memory in order to make room to hold another value in that