A Simple Code Generator
1 Register and Address
Descriptors
2 The Code-Generation Algorithm
3 Design of the Function getReg
4 Exercises for Section 8.6
In this section, we shall consider an algorithm that generates code for
a single basic block. It considers each three-address instruction in turn, and
keeps track of what values are in what registers so it can avoid generating
unnecessary loads and stores.
One of the primary issues during code generation is
deciding how to use registers to best advantage. There are four principal uses
of registers:
In most machine architectures, some or all of the operands of an
operation must be in registers in order to perform the operation.
Registers make good temporaries — places to hold
the result of a subex-pression while a larger expression is being evaluated, or
more generally, a place to hold a variable that is used only within a single
basic block.
Registers are used to hold (global)
values that are computed in one basic block and used in other blocks, for
example, a loop index that is incre-
mented going around the loop and is used several
times within the loop.
Registers are often used to help with run-time storage management, for
example, to manage the run-time stack, including the maintenance of stack
pointers and possibly the top elements of the stack itself.
These are competing needs, since
the number of registers available is limited. The algorithm in this section
assumes that some set of registers is available to hold the values that are
used within the block. Typically, this set of regis-ters does not include all
the registers of the machine, since some registers are reserved for global
variables and managing the stack. We assume that the basic block has already
been transformed into a preferred sequence of three-address instructions, by
transformations such as combining common subexpressions. We further assume that
for each operator, there is exactly one machine instruc-tion that takes the
necessary operands in registers and performs that operation, leaving the result
in a register. The machine instructions are of the form
LD reg, mem
ST mem, reg
OP reg, reg, reg
1. Register and Address
Descriptors
Qur code-generation algorithm
considers each three-address instruction in turn and decides what loads are
necessary to get the needed operands into registers. After generating the
loads, it generates the operation itself. Then, if there is a need to store the
result into a memory location, it also generates that store.
In order to make the needed
decisions, we require a data structure that tells us what program variables
currently have their value in a register, and which register or registers, if
so. We also need to know whether the memory location for a given variable
currently has the proper value for that variable, since a new value for the
variable may have been computed in a register and not yet stored. The desired
data structure has the following descriptors:
For each available register, a register descriptor keeps track of the
variable names whose current value is in that register. Since we shall use only
those registers that are available for local use within a basic block, we
assume
that initially, all register descriptors are empty.
As the code generation progresses, each register will hold the value of zero or
more names.
For each program variable, an address descriptor keeps track of the loca-tion or locations where
the current value of that variable can be found. The location might be a
register, a memory address, a stack location, or some set of more than one of
these. The information can be stored in the symbol-table entry for that
variable name.
2. The Code-Generation Algorithm
An essential part of the
algorithm is a function getReg(I),
which selects regis-ters for each memory location associated with the
three-address instruction I. Function getReg
has access to the register and address descriptors for all the variables of the
basic block, and may also have access to certain useful data-flow information
such as the variables that are live on exit from the block. We shall discuss getReg after presenting the basic algorithm.
While we do not know the total number of registers available for local data
belonging to a basic block, we assume that there are enough registers so that,
after freeing all available regis-ters by storing their values in memory^ there
are enough registers to accomplish any three-address operation.
In a three-address instruction such as x = y + z, we shall
treat + as a generic operator and ADD as the equivalent machine instruction. We do not, therefore, take
advantage of commutativity of +. Thus, when we implement the operation, the
value of y must be in the second register mentioned in the ADD
instruction, never the third. A possible improvement to the algorithm is to
generate code for both x = y + z and x = z + y whenever + is a commutative operator, and pick the better code
sequence.
Machine Instructions
for Operations
For a three-address instruction such as x = y + z, do the following:
1. Use getReg(x = y + z) to select registers for x,
y, and z. Call these Rx, Ry, and Rz.
2. If y is not in Ry (according to the register
descriptor for Ry), then issue an instruction LD Ry,y', where y' is one of the
memory locations for y (according to the address descriptor for y).
Similarly,
if z is not in Rz, issue and instruction LD Rz,z', where z' is a location for
z.
Issue the
instruction ADD Rx,Ry,Rz.
Machine
Instructions for Copy Statements
There is an important special case: a three-address
copy statement of the form x = y. We assume that getReg will always choose the
same register for both x and y. If y is not already in that register Ry, then
generate the machine instruction LD Ry, y. If y was already in Ry, we do
nothing. It is only necessary that we adjust the register description for Ry so
that it includes x as one of the values found there.
Ending the Basic
Block
As we have described the algorithm, variables used by the block may wind
up with their only location being a register. If the variable is a temporary
used only within the block, that is fine; when the block ends, we can forget
about the value of the temporary and assume its register is empty. However, if
the variable is live on exit from the block, or if we don't know which
variables are live on exit, then we need to assume that the value of the
variable is needed later. In that case, for each variable x whose location descriptor does not say that its value is located
in the memory location for x, we must
generate the instruction ST x, R, where R is a register in which #'s value
exists at the end of the block.
Managing
Register and Address Descriptors
As the code-generation algorithm
issues load, store, and other machine instruc-tions, it needs to update the
register and address descriptors. The rules are as follows:
For the instruction LD R:x
Change the register descriptor
for register R so it holds only x.
Change the address descriptor for
x by adding register R as an ad-ditional location.
For the instruction ST x, R, change the address descriptor for x to include its own memory location.
For an operation such as ADD Rx,Ry,Rz implementing a three-address instruction x — y + z
Change the register descriptor
for Rx so that it holds only x.
Change the address descriptor for
x so that its only location is Rx. Note that the memory location for x
is not now in the address de-scriptor
for x.
(c) Remove Rx from the address descriptor of
any variable other than x.
When we process a copy statement x = y, after generating the load for y into register Ry, if
needed, and after managing descriptors as for all load statements (per rule 1):
(a) Add x to the register descriptor for
Ry.
(b) Change the address descriptor for x so that its only location is Ry.
Example 8.16 : Let us translate the basic block
consisting of the three-address statements
t = a - b
u = a
- c
v = t + u
a = d
d = v + u
Here we assume that t, u, and v are
temporaries, local to the block, while a, b, c, and d are
variables that are live on exit from the block. Since we have not yet
discussed how the function getReg
might work, we shall simply assume that there are as many registers as we need,
but that when a register's value is no longer needed (for example, it holds
only a temporary, all of whose uses have been passed), then we reuse its
register.
A summary of all the machine-code
instructions generated is in Fig. 8.16. The figure also shows the register and
address descriptors before and after the translation of each three-address
instruction.
For the first three-address instruction, t = a - b we need
to issue three in-structions, since nothing is in a register initially. Thus,
we see a and b loaded into registers Rl and R2, and the value t produced in register R2. Notice that we can use R2 for t because the value b previously in R2 is not
needed within the block. Since b is presumably live on exit from the block, had
it not been in its own memory location (as indicated by its address
descriptor), we would have had to store R2 into b
first. The decision to do so, had we needed R2, would be
taken by getReg.
The second instruction, u = a - c ,
does not require a load of a, since it is already in register Rl. Further,
we can reuse Rl for the result, u, since the value of a,
previously in that register, is no longer needed within the block, and its
value is in its own memory location if a is
needed outside the block. Note that we change the address descriptor for a to
indicate that it is no longer in Rl, but is in the memory location called a.
The third instruction, v = t + u,
requires only the addition. Further, we can use R3 for the
result, v, since the value of c in that register is no longer needed within the
block, and c has its value in its own memory location.
The copy instruction, a = d, requires a load of d, since it is not in memory. We show register R2's
descriptor holding both a and d. The addition of a to the register descriptor is the result of our processing the copy
statement, and is not the result of any machine instruction.
The fifth instruction, d = v + u, uses two values that are in registers.
Since u is a temporary whose value is no longer needed, we have chosen to reuse
its register Rl for the new value of d. Notice that d is now in only Rl, and is
not in its own memory location. The same holds for a, which is
in R2 and not in the memory location called a. As a
result, we need a "coda" to the machine code for the basic block that
stores the live-on-exit variables a and d into their memory locations. We show these as the last two
instructions.
3. Design of the Function getReg
Lastly, let us consider how to
implement getReg(I), for a
three-address in-struction /. There are many options, although there are also
some absolute prohibitions against choices that lead to incorrect code due to
the loss of the value of one or more live variables. We begin our examination
with the case of an operation step, for which we again use x = y + z as the generic example. First, we must pick a register
for y and a register for z. The issues are the same, so we shall
concentrate on picking register Ry for y. The rules are as
follows:
1. If 2/ is currently in a register, pick a register already containing
y as Ry. Do not issue a machine instruction to load this register, as none is
needed.
2. If y is not in a register, but
there is a register that is currently empty, pick one such register as Ry.
The difficult case occurs when y is not in a register, and there is no
register that is currently empty. We need to pick one of the allowable
registers anyway, and we need to make it safe to reuse. Let R be a candidate register, and suppose v is one of the variables that the
register descriptor for R says is in R. We need to make sure that u's value
either is not really needed, or that there is somewhere else we can go to get
the value of R. The possibilities
are:
If the address descriptor for v says that v is somewhere besides R,
then we are OK.
(b) If v is x, the value being computed by instruction /,
and x is not also one of the other operands of instruction 7 (z in this
example), then we are OK. The reason is that in this case, we know this value
of x is never again going to be used, so we are free to ignore it.
(c) Otherwise, if v is not used later (that is, after the instruction /,
there are no further uses of v, and if v is live on exit from the block, then v
is recomputed within the block), then we are OK.
If we are not OK by
one of the first two cases, then we need to generate the store
instruction ST v, R to place a copy
of v in its own memory location. This
operation is called a spill.
Since R may hold several variables at the moment, we repeat the above
steps for each such variable v. At
the end, R's "score" is the
number of store instructions we needed to generate. Pick one of the registers
with the lowest score.
Now, consider the selection of the register Rx. The issues and options are almost as for y, so we shall only mention the differences.
1. Since a new value of x is being computed, a register that
holds only x is always an acceptable
choice for Rx. This statement holds even if x
is one of y and z, since our machine instructions allows two registers to be the
same in one instruction.
2. If y is not used after instruction I,
in the sense described for variable v
in item (3c), and Ry holds only y after being
loaded, if necessary, then Ry can also be used as Rx. A similar option holds regarding z
and Rz.
The last matter to consider
specially is the case when I is a
copy instruction x — y. We pick the
register Ry as above. Then, we always choose Rx = Ry.
4. Exercises for Section 8.6
Exercise 8.6.1 : For each of the
following C assignment statements
a) x = a
+ b*c;
b) x =
a/(b+c) - d * ( e + f ) ;
c) x =
a[i] +
1;
d) a [ i ]
= b [ c [ i ] ] ;
e) a [ i
] [ j ] = b [ i ] [ k ] +
c[k][j];
f) *p++ = *q++;
generate
three-address code, assuming that all array elements are integers tak-ing four
bytes each. In parts (d) and (e), assume that a, b, and c are constants giving the
location of the first (Oth) elements of the arrays with those names, as in all
previous examples of array accesses in this chapter.
! Exercise 8.6.2: Repeat
Exercise 8.6.1 parts (d) and (e), assuming that the arrays a,
b, and c are located via pointers, pa,
pb, and pc, respectively, pointing to the
locations of their respective first elements.
Exercise 8.6.3: Convert
your three-address code from Exercise 8.6.1 into ma-chine code for the machine
model of this section. You may use as many registers as you need.
Exercise 8.6.4: Convert your three-address code
from Exercise 8.6.1 into ma-chine code, using the simple code-generation
algorithm of this section, assuming three registers are available. Show the
register and address descriptors after each step.
Exercise 8.6.5:
Repeat Exercise 8.6.4, but assuming only two registers are available.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.