Chapter: Compilers : Principles, Techniques, & Tools : Code Generation

The Target Language

1 A Simple Target Machine Model 2 Program and Instruction Costs 3 Exercises for Section 8.2

The Target Language

1 A Simple Target Machine Model

2 Program and Instruction Costs

3 Exercises for Section 8.2

Familiarity with the target machine and its instruction set is a prerequisite for designing a good code generator. Unfortunately, in a general discussion of code generation it is not possible to describe any target machine in sufficient detail to generate good code for a complete language on that machine. In this chapter, we shall use as a target language assembly code for a simple computer that is representative of many register machines. However, the code-generation techniques presented in this chapter can be used on many other classes of machines as well.

1. A Simple Target Machine Model

Our target computer models a three-address machine with load and store oper-ations, computation operations, jump operations, and conditional jumps. The underlying computer is a byte-addressable machine with n general-purpose reg-isters, R0,R1, ... ,Rn - 1. A full-fledged assembly language would have scores of instructions. To avoid hiding the concepts in a myriad of details, we shall use a very limited set of instructions and assume that all operands are integers. Most instructions consists of an operator, followed by a target, followed by a list of source operands. A label may precede an instruction. We assume the following kinds of instructions are available:

• Load operations: The instruction LD dst, addr loads the value in location addr into location dst This instruction denotes the assignment dst = addr. The most common form of this instruction is LD r, x which loads the value in location x into register r. An instruction of the form LD r±,r2 is a register-to-register copy in which the contents of register r2 are copied into register r1.

• Store operations: The instruction ST x, r stores the value in register r into the location x. This instruction denotes the assignment x = r.

• Computation operations of the form OP dst, src1,src2, where OP is a op-erator like ADD or SUB, and dst, srcy, and src2 are locations, not necessarily distinct. The effect of this machine instruction is to apply the operation

represented by OP to the values in locations srci and src2, and place the result of this operation in location dst. For example, SUB n,r2,r3 computes n = r2 - r3. Any value formerly stored in n is lost, but if r1 is r2 or r 3 , the old value is read first. Unary operators that take only one operand do not have a src2.

• Unconditional jumps: The instruction BR L causes control to branch to the machine instruction with label L. (BR stands for branch.)

• Conditional jumps of the form Bcond r, L, where r is a register, L is a label, and cond stands for any of the common tests on values in the register r. For example, BLTZ r, L causes a jump to label L if the value in register r is less than zero, and allows control to pass to the next machine instruction if not.

We assume our target machine has a variety of addressing modes:

• In instructions, a location can be a variable name x referring to the mem-ory location that is reserved for x (that is, the /-value of x).

• A location can also be an indexed address of the form a(r), where a is

a variable and r is a register. The memory location denoted by a(r) is computed by taking the I-value of a and adding to it the value in register r. For example, the instruction LD R l , a(R2) has the effect of setting Rl = contents (a + contents (R2)), where contentsix) denotes the contents of the register or memory location represented by x. TJiis addressing mode is useful for accessing arrays, where a is the base address of the array (that is, the address of the first element), and r holds the number of bytes past that address we wish to go to reach one of the elements of array a.

A memory location can be an integer indexed by a register. For ex-ample, LD R l , 100(R2) has the effect of setting Rl = contents(100 + contents(R2)), that is, of loading into Rl the value in the memory loca-

tion obtained by adding 100 to the contents of register R2. This feature is useful for following pointers, as we shall see in the example below-

We also allow two indirect addressing modes: *r means the memory lo-cation found in the location represented by the contents of register r and *100(r) means the memory location found in the location obtained by

adding 100 to the contents of r. For example, LD R l , *100(R2) has the effect of setting Rl = contents(contents(10Q + contents(R2))), that is, of loading into Rl the value in the memory location stored in the memory location obtained by adding 100 to the contents of register R2.

Finally, we allow an immediate constant addressing mode. The constant is prefixed by #. The instruction LD R l , #100 loads the integer 100 into register Rl, and ADD R l , R l , #100 adds the integer 100 into register Rl. '

Comments at the end of instructions are preceded by / / .

Example 8.2 : The three-address statement x = y - z can be implemented by the machine instructions:

LD Rl, Y // Rl = y

LD R2, Z // R2 = z

SUB Rl, Rl, R2 // Rl = Rl - R2

ST x, Rl / / x = Rl

We can do better, perhaps. One of the goals of a good code-generation algorithm is to avoid using all four of these instructions, whenever possible. For example, y and/or z may have been computed in a register, and if so we can avoid the LD step(s). Likewise, we might be able to avoid ever storing x if its value is used within the register set and is not subsequently needed.

Suppose a is an array whose elements are 8-byte values, perhaps real num-bers. Also assume elements of a are indexed starting at 0. We may execute the three-address instruction b = a [ i ] by the machine instructions:

LD Rl, i // Rl = i

MUL Rl, Rl, 8 // Rl = Rl * 8

LD R2, a(Rl) // R2 = contents(a + contents(Rl))

ST b , R2 / / b = R2

That is, the second step computes Si, and the third step places in register R2 the value in the zth element of a — the one found in the location that is 8i bytes past the base address of the array a.

Similarly, the assignment into the array a represented by three-address in-

struction a [ j ] = c is implemented by:

LD Rl, c // Rl = c

LD R2, j // R2 = j

MUL R2, R2, 8 // R2 = R2 * 8

ST a (R2), Rl // contents(a + contents(R2)) = Rl

To implement a simple pointer indirection, such as the three-address state-ment x = *p, we can use machine instructions like:

LD Rl, p // Rl = p

LD R2, 0(R1) // R2 = contents(0 + contents(Rl))

ST x, R2 / / x = R2

The assignment through a pointer *p = y is similarly implemented in machine code by:

LD Rl, p // Rl = p

LD R2, y / / R2 = y

ST 0(R1), R2 // contents(0 + contents(Rl)) = R2

Finally, consider a conditional-jump three-address instruction like

if x < y goto L

T he machine - code equivalent would be something like:

LD R l , x

LD R2, y

SUB R l , R l , R2

BLTZ R l , M

/ / Rl = x

/ / R2 = y

/ / Rl = Rl - R2

// if Rl < 0 jump to M

Here, M is the label that represents the first machine instruction generated from the three-address instruction that has label L. As for any three-address instruc-tion, we hope that we can save some of these machine instructions because the needed operands are already in registers or because the result need never be stored. •

2. Program and Instruction Costs

We often associate a cost with compiling and running a program. Depending on what aspect of a program we are interested in optimizing, some common cost measures are the length of compilation time and the size, running time and power consumption of the target program.

Determining the actual cost of compiling and running a program is a com - plex problem. Finding an optimal target program for a given source program is an undecidable problem in general, and many of the subproblems involved are NP - hard . As we have indicated, in code generation we must often be content with heuristic techniques that produce good but not necessarily optimal target programs.

For the remainder of this chapter, we shall assume each target - language instruction has an associated cost. For simplicity, we take the cost of an in-struction to be one plus the costs associated with the addressing modes of the operands . This cost corresponds to the length in words of the instruction. Addressing modes involving registers have zero additional cost, while those in-volving a memory location or constant in them have an additional cost of one, because such operands have to be stored in the words following the instruction.

Some examples:

• The instruction LD RO, Rl copies the contents of register Rl into register RO. This instruction has a cost of one because no additional memory words are required.

• The instruction LD RO, M loads the contents of memory location M into register RO. T h e cost is two since the address of memory location M is in the word following the instruction.

• The instruction LD R l , *100(R2) loads into register Rl the value given by contents(contents(100 + contents(K2))). The cost is three because the constant 100 is stored in the word following the instruction.

In this chapter we assume the cost of a target-language program on a given input is the sum of costs of the individual instructions executed when the pro-gram is run on that input. Good code-generation algorithms seek to minimize the sum of the costs of the instructions executed by the generated target pro-gram on typical inputs. We shall see that in some situations we can actually generate optimal code for expressions on certain classes of register machines.

3. Exercises for Section 8.2

E x e r c i s e 8 . 2 . 1 : Generate code for the following three-address statements as-suming all variables are stored in memory locations.

x = 1

x = a

x = a + 1

x = a + b

The two statements

x	=	b	*	c
y	=	a	+	x

E x e r c i s e 8 . 2 . 2: Generate code for the following three-address statements as-suming a and b are arrays whose elements are 4-byte values.

The four-statement sequence

= a [ i ]

= b [ j ]

a [ i ] = y

b [ j ] = x

The three-statement sequence

= a [ i ]

= b [ i ]

= x * y

The three-statement sequence

x = a [ i ] = b[x]

a [ i ] = y

Exercise 8 . 2 . 3: Generate code for the following three-address sequence as-suming that p and q are in memory locations:

y = *q

q = q + 4

*p = y

p = p + 4

Exercise 8 . 2 . 4 : Generate code for the following sequence assuming that x, y, and z are in memory locations:

if x < y goto LI

z = 0

goto L2

LI: z = 1

Exercise 8 . 2 . 5 : Generate code for the following sequence assuming hat n is

in a memory location:

s = 0

i = 0

LI: if i > n goto L2 s = s + i

i = i + 1 goto LI

L2:

Exercise 8 . 2 . 6: Determine the costs of the following instruction sequences:

LD RO, y LD Rl, z

ADD RO, RO, Rl

ST x, RO

LD RO, i

MUL RO, RO, 8

LD Rl, a(R0)

ST b, Rl

LD RO, c LD Rl, i

MUL Rl, Rl, 8

ST a (Rl), RO

LD RO, p

LD Rl, 0(R0)

ST x, Rl

LD RO, p

LD R l , x

ST 0(R0), Rl

f) LD RO, x

LD R l , y

SUB RO, RO, Rl

BLTZ *R3, RO

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Compilers : Principles, Techniques, & Tools : Code Generation : The Target Language |