The Target Language
1 A Simple Target Machine Model
2 Program and Instruction Costs
3 Exercises for Section 8.2
Familiarity with the target
machine and its instruction set is a prerequisite for designing a good code
generator. Unfortunately, in a general discussion of code generation it is not
possible to describe any target machine in sufficient detail to generate good
code for a complete language on that machine. In this chapter, we shall use as
a target language assembly code for a simple computer that is representative of
many register machines. However, the code-generation techniques presented in
this chapter can be used on many other classes of machines as well.
1. A Simple Target Machine Model
Our target computer models a three-address machine with load and store
oper-ations, computation operations, jump operations, and conditional jumps.
The underlying computer is a byte-addressable machine with n general-purpose reg-isters, R0,R1, ... ,Rn - 1. A full-fledged
assembly language would have scores of instructions. To avoid hiding the
concepts in a myriad of details, we shall use a very limited set of
instructions and assume that all operands are integers. Most instructions
consists of an operator, followed by a target, followed by a list of source
operands. A label may precede an instruction. We assume the following kinds of
instructions are available:
• Load operations: The instruction LD dst, addr loads the value in location addr into location dst This instruction denotes the assignment dst = addr. The most common form of this instruction is LD r, x which loads the value in location x
into register r. An instruction of the
form LD r±,r2 is a
register-to-register copy in
which the contents of register r2 are
copied into register r1.
• Store operations: The instruction ST x, r stores the value in register
r into the location x. This instruction denotes the assignment x = r.
• Computation operations of the form OP dst, src1,src2, where OP is a
op-erator like ADD or SUB, and dst, srcy, and src2 are locations, not
necessarily distinct. The effect of this machine instruction is to apply the
operation
represented by OP to the values
in locations srci and src2, and place
the result of this operation in location dst. For example, SUB n,r2,r3
computes n = r2 - r3. Any value formerly stored in n is lost,
but if r1 is r2
or r 3 , the old value is read first. Unary operators that take only one
operand do not have a src2.
• Unconditional jumps: The instruction BR L
causes control to branch to the machine instruction with label
L. (BR stands for branch.)
• Conditional jumps of the form Bcond r, L, where r is a register, L is
a label, and cond stands for any of the common tests on values in the register
r. For example, BLTZ r, L causes a jump to label L if the value in register r
is less than zero, and allows control to pass to the next machine instruction
if not.
We assume
our target machine has a variety of addressing modes:
• In instructions, a location can be a variable name
x referring to the mem-ory location
that is reserved for x (that is, the
/-value of x).
• A location
can also be an indexed address of the form a(r), where a
is
a
variable and r is a register. The memory location denoted by a(r) is computed by taking the I-value
of a and adding to it the value in
register r. For example, the instruction LD
R l , a(R2) has the effect of setting Rl = contents (a + contents (R2)), where
contentsix) denotes the contents of the
register or memory location represented by x. TJiis addressing mode is useful
for accessing arrays, where a is the base address of the array (that is, the
address of the first element), and r holds the number of bytes past that
address we wish to go to reach one of the elements of array a.
A memory
location can be an integer indexed by a register. For ex-ample, LD R l , 100(R2) has the effect of setting Rl = contents(100 + contents(R2)), that is,
of loading into Rl the value in the memory loca-
tion obtained by adding 100 to the
contents of register R2. This feature is useful for
following pointers, as we shall see in the example below-
We also
allow two indirect addressing modes: *r
means the memory lo-cation found in the location represented by the contents of
register r and *100(r) means
the memory location found in the location obtained by
adding 100 to the contents of r. For
example, LD R l , *100(R2) has the
effect of setting Rl = contents(contents(10Q
+ contents(R2))), that is,
of loading into Rl the value in the memory location stored in the memory
location obtained by adding 100 to the contents of register R2.
Finally,
we allow an immediate constant addressing mode. The constant is prefixed by #.
The instruction LD R l , #100 loads
the integer 100 into register Rl, and ADD R l ,
R l , #100 adds the integer 100 into register Rl. '
Comments
at the end of instructions are preceded by / / .
Example
8.2 : The three-address statement x = y - z can be implemented by the machine
instructions:
LD Rl, Y // Rl = y
LD R2, Z // R2 = z
SUB Rl, Rl, R2 //
Rl = Rl - R2
ST x, Rl / /
x = Rl
We can do
better, perhaps. One of the goals of
a good code-generation algorithm is to avoid using all four of these
instructions, whenever possible. For example, y and/or z may have been computed
in a register, and if so we can avoid the LD step(s). Likewise, we might be
able to avoid ever storing x if its value is used within the register set and
is not subsequently needed.
Suppose a
is an array whose elements are 8-byte values, perhaps real num-bers. Also
assume elements of a are indexed starting at 0. We may execute the
three-address instruction b = a [ i
] by the machine instructions:
We can do
better, perhaps. One of the goals of
a good code-generation algorithm is to avoid using all four of these
instructions, whenever possible. For example, y and/or z may have been computed
in a register, and if so we can avoid the LD step(s). Likewise, we might be
able to avoid ever storing x if its value is used within the register set and
is not subsequently needed.
Suppose a
is an array whose elements are 8-byte values, perhaps real num-bers. Also
assume elements of a are indexed starting at 0. We may execute the
three-address instruction b = a [ i ]
by the machine instructions:
LD Rl, i // Rl = i
MUL Rl, Rl,
8 // Rl = Rl * 8
LD R2, a(Rl) // R2 = contents(a + contents(Rl))
ST b ,
R2 / / b
= R2
That is,
the second step computes Si, and the third step places in register R2 the value
in the zth element of a — the one found in the location that is 8i bytes past
the base address of the array a.
Similarly,
the assignment into the array a represented by three-address in-
struction
a [ j ] = c is implemented by:
LD Rl, c // Rl = c
LD R2, j // R2 = j
MUL R2, R2, 8 // R2 = R2 * 8
ST a (R2), Rl //
contents(a + contents(R2)) = Rl
To
implement a simple pointer indirection, such as the three-address state-ment x
= *p, we can use machine instructions like:
LD Rl,
p // Rl = p
LD R2,
0(R1) // R2 = contents(0 +
contents(Rl))
ST x,
R2 / / x
= R2
The
assignment through a pointer *p = y is similarly implemented in machine code
by:
LD Rl, p // Rl = p
LD R2, y / /
R2 = y
ST 0(R1),
R2 // contents(0 +
contents(Rl)) = R2
Finally,
consider a conditional-jump three-address instruction like
if x <
y goto L
T he machine
- code equivalent would be something like:
LD R l , x
LD R2, y
SUB R l , R
l , R2
BLTZ R l
, M
/ / Rl = x
/ / R2 = y
/ / Rl = Rl - R2
// if Rl < 0 jump
to M
Here, M
is the label that represents the first machine instruction generated from the
three-address instruction that has label L. As for any three-address
instruc-tion, we hope that we can save some of these machine instructions
because the needed operands are already in registers or because the result need
never be stored. •
2. Program and Instruction Costs
We often associate a cost with
compiling and running a program. Depending on what
aspect of a program we are
interested in optimizing, some
common cost measures are the
length of compilation time and
the size, running time and power
consumption of the target program.
Determining the actual cost of
compiling and running a program is a com - plex problem. Finding an optimal
target program for a given source program is an undecidable problem in general,
and many of the subproblems involved are NP - hard . As we have indicated, in
code generation we must often be content with heuristic techniques that produce
good but not necessarily optimal target programs.
For the remainder of this chapter, we shall assume
each target - language instruction has an associated cost. For simplicity, we
take the cost of an in-struction to be one plus the costs associated with the
addressing modes of the operands . This cost corresponds to the length in words
of the instruction. Addressing modes involving registers have zero additional
cost, while those in-volving a memory location or constant in them have an
additional cost of one, because such operands have to be stored in the words
following the instruction.
Some
examples:
• The
instruction LD RO, Rl copies the contents of register Rl into register RO. This
instruction has a
cost of one because
no additional memory words are required.
• The
instruction LD RO, M loads the contents of memory location M into register
RO. T h e cost is two since the address
of memory location M is in the word following the instruction.
• The
instruction LD R l , *100(R2) loads into register Rl the value given by
contents(contents(100 + contents(K2))).
The cost is three
because the constant 100 is stored in the word following the
instruction.
In this
chapter we assume the cost of a target-language program on a given input is the
sum of costs of the individual instructions executed when the pro-gram is run
on that input. Good code-generation algorithms seek to minimize the sum of the
costs of the instructions executed by the generated target pro-gram on typical
inputs. We shall see that in some situations we can actually generate optimal
code for expressions on certain classes of register machines.
3. Exercises for Section 8.2
E x e r c i s e 8 . 2 . 1 :
Generate code for the following three-address statements as-suming all
variables are stored in memory locations.
x = 1
x
= a
x = a + 1
x
= a + b
The two statements
x |
= |
b |
* |
c |
y |
= |
a |
+ |
x |
E x e r c i s e 8 . 2 . 2: Generate code for the following three-address statements
as-suming a and b are arrays whose elements are 4-byte values.
The four-statement sequence
=
a [ i ]
=
b [ j ]
a [ i ] = y
b [ j ] = x
The three-statement sequence
=
a [ i ]
=
b [ i ]
= x * y
The three-statement sequence
x = a [ i ] = b[x]
a [ i ] =
y
Exercise 8 . 2 . 3: Generate
code for the following three-address sequence as-suming that p and q are in
memory locations:
y = *q
q = q + 4
*p = y
p = p + 4
Exercise
8 . 2 . 4 :
Generate code for the following sequence assuming that x, y, and z are in
memory locations:
if x < y
goto LI
z = 0
goto L2
LI: z = 1
Exercise 8 . 2 . 5 : Generate code for the following sequence
assuming hat n is
in a memory location:
s
= 0
i
= 0
LI:
if i > n goto L2 s = s + i
i
= i + 1 goto LI
L2:
Exercise 8 . 2 . 6: Determine
the costs of the following instruction sequences:
LD
RO, y LD Rl, z
ADD
RO, RO, Rl
ST
x, RO
LD RO, i
MUL RO, RO, 8
LD Rl, a(R0)
ST b, Rl
LD
RO, c LD Rl, i
MUL
Rl, Rl, 8
ST
a
(Rl), RO
LD
RO, p
LD Rl, 0(R0)
ST x, Rl
LD RO, p
LD R l , x
ST 0(R0), Rl
f) LD RO, x
LD R l , y
SUB RO, RO, Rl
BLTZ *R3, RO
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.