Chapter: Compilers : Principles, Techniques, & Tools : Code Generation

Addresses in the Target Code

1 Static Allocation 2 Stack Allocation 3 Run-Time Addresses for Names 4 Exercises for Section 8.3

Addresses in the Target Code

1 Static Allocation

2 Stack Allocation

3 Run-Time Addresses for Names

4 Exercises for Section 8.3

In this section, we show how names in the IR can be converted into addresses in the target code by looking at code generation for simple procedure calls and returns using static and stack allocation. In Section 7.1, we described how each executing program runs in its own logical address space that was partitioned into four code and data areas:

A statically determined area Code that holds the executable target code. The size of the target code can be determined at compile time.

A statically determined data area Static for holding global constants and other data generated by the compiler. The size of the global constants and compiler data can also be determined at compile time.

A dynamically managed area Heap for holding data objects that are allo-cated and freed during program execution. The size of the Heap cannot be determined at compile time.

A dynamically managed area Stack for holding activation records as they are created and destroyed during procedure calls and returns. Like the Heap, the size of the Stack cannot be determined at compile time.

1. Static Allocation

To illustrate code generation for simplified procedure calls and returns, we shall focus on the following three-address statements:

• c a l l callee

return

h a l t

• a c t i o n , which is a placeholder for other three-address statements.

The size and layout of activation records are determined by the code gener-ator via the information about names stored in the symbol table. We shall first illustrate how to store the return address in an activation record on a procedure call and how to return control to it after the procedure call. For convenience, we assume the first location in the activation holds the return address.

Let us first consider the code needed to implement the simplest case, static allocation. Here, a call callee statement in the intermediate code can be im-plemented by a sequence of two target-machine instructions:

ST callee. static Area, #here + 20

BR callee. code Area

The ST instruction saves the return address at the beginning of the activation record for callee, and the BR transfers control to the target code for the called procedure callee. The attribute before callee. static Area is a constant that gives the address of the beginning of the activation record for callee, and the attribute callee. code Area is a constant referring to the address of the first instruction of the called procedure callee in the Code area of the run-time memory.

The operand #here+ 20 in the ST instruction is the literal return address; it is the address of the instruction following the BR instruction. We assume that #here is the address of the current instruction and that the three constants plus the two instructions in the calling sequence have a length of 5 words or 20 bytes.

The code for a procedure ends with a return to the calling procedure, except that the first procedure has no caller, so its final instruction is HALT, which returns control to the operating system. A return callee statement can be implemented by a simple jump instruction

BR * callee. static Area

which transfers control to the address saved at the beginning of the activation record for callee.

E x a m p l e 8 . 3 : Suppose we have the following three-address code:

// code for c

actioni

call p

a c t i o n2

halt

// code for p

actions

return

Figure 8.4 shows the target program for this three-address code. We use the pseudoinstruction ACTION to represent the sequence of machine instructions to execute the statement action, which represents three-address code that is not relevant for this discussion. We arbitrarily start the code for procedure c at address 100 and for procedure p at address 200. We that assume each ACTION instruction takes 20 bytes. We further assume that the activation records for these procedures are statically allocated starting at locations 300 and 364, re-spectively.

The instructions starting at address 100 implement the statements

action i ; call p ; actiori2; h a l t

of the first procedure c. Execution therefore starts with the instruction ACTIONi at address 100. The ST instruction at address 120 saves the return address 140 in the machine-status field, which is the first word in the activation record of p. The BR instruction at address 132 transfers control the first instruction in the target code of the called procedure p.

After executing ACTION3, the jump instruction at location 220 is executed. Since location 140 was saved at address 364 by the call sequence above, *364 represents 140 when the BR statement at address 220 is executed. Therefore, when procedure p terminates, control returns to address 140 and execution of procedure c resumes. •

2. Stack Allocation

Static allocation can become stack allocation by using relative addresses for storage in activation records. In stack allocation, however, the position of an activation record for a procedure is not known until run time. This position is usually stored in a register, so words in the activation record can be accessed as offsets from the value in this register. The indexed address mode of our target machine is convenient for this purpose.

Relative addresses in an activation record can be taken as offsets from any known position in the activation record, as we saw in Chapter 7. For conve nience, we shall use positive offsets by maintaining in a register SP a pointer to the beginning of the activation record on top of the stack. When a procedure call occurs, the calling procedure increments SP and transfers control to the called procedure. After control returns to the caller, we decrement SP, thereby deallocating the activation record of the called procedure.

The code for the first procedure initializes the stack by setting SP to the start of the stack area in memory:

LD SP, ftstackStart II initialize the stack

code for the first procedure

HALT // terminate execution

A procedure call sequence increments SP, saves the return address, and transfers control to the called procedure:

ADD SP, SP, #caller.recordSize II increment stack pointer

ST *SP, #here + 16 // save return address

BR callee. code Area II return to caller

The operand t caller.recordSize represents the size of an activation record, so the ADD instruction makes SP point to the next activation record. The operand there +16 in the ST instruction is the address of the instruction following BR; it is saved in the address pointed to by SP.

The return sequence consists of two parts. The called procedure transfers

control to the return address using

BR *0(SP) // return to caller

The reason for using *0(SP) in the BR instruction is that we need two levels of indirection: 0(SP) is the address of the first word in the activation record and *0(SP) is the return address saved there.

The second part of the return sequence is in the caller, which decrements SP, thereby restoring SP to its previous value. That is, after the subtraction SP points to the beginning of the activation record of the caller:

SUB SP, SP, #caller.recordSize II decrement stack pointer

Chapter 7 contains a broader discussion of calling sequences and the trade-offs in the division of labor between the calling and called procedures.

Example 8 . 4 : The program in Fig. 8.5 is an abstraction of the quicksort program in the previous chapter. Procedure q is recursive, so more than one activation of q can be alive at the same time.

Suppose that the sizes of the activation records for procedures m, p, and q have been determined to be msize, psize, and qsize, respectively. The first word in each activation record will hold a return address. We arbitrarily assume that the code for these procedures starts at addresses 100, 200, and 300, respectively,

and that the stack starts at address 600. The target program is shown in Figure 8.6.

We assume that ACTION4 contains a conditional jump to the address 456 of the return sequence from q; otherwise, the recursive procedure q is condemned to call itself forever.

If msize, psize, and qsize are 20, 40, and 60, respectively, the first instruction at address 100 initializes the SP to 600, the starting address of the stack. SP holds 620 just before control transfers from m to q, because msize is 20. Sub-sequently, when q calls p, the instruction at address 320 increments SP to 680, where the activation record for p begins; SP reverts to 620 after control returns to q. If the next two recursive calls of q return immediately, the maximum value of SP during this execution 680. Note, however, that the last stack location used is 739, since the activation record of q starting at location 680 extends for 60 bytes. •

3. Run-Time Addresses for Names

The storage-allocation strategy and the layout of local data in an activation record for a procedure determine how the storage for names is accessed. In Chapter 6, we assumed that a name in a three-address statement is really a pointer to a symbol-table entry for that name. This approach has a significant advantage; it makes the compiler more portable, since the front end need not be changed even when the compiler is moved to a different machine where a different run-time organization is needed. On the other hand, generating the specific sequence of access steps while generating intermediate code can be of

significant advantage in an optimizing compiler, since it lets the optimizer take advantage of details it would not see in the simple three-address statement.

In either case, names must eventually be replaced by code to access storage locations. We thus consider some elaborations of the simple three-address copy statement x = 0. After the declarations in a procedure are processed, suppose the symbol-table entry for x contains a relative address 12 for x. For consider the case in which x is in a statically allocated area beginning at address static. Then the actual run-time address of x is static +12. Although the compiler can eventually determine the value of static + 12 at compile time, the position of the static area may not be known when intermediate code to access the name is generated. In that case, it makes sense to generate three-address code to "compute" static + 12, with the understanding that this computation will be carried out during the code generation phase, or possibly by the loader, before the program runs. The assignment x = 0 then translates into

s t a t i c [ 1 2 ] = 0

If the static area starts at address 100, the target code for this statement is

LD 112, #0

4. Exercises for Section 8.3

Exercise 8.3 . 1 : Generate code for the following three-address statements as-suming stack allocation where register SP points to the top of the stack.

call p

call q

return

call r

return return

E x e r c i s e 8 . 3 . 2: Generate code for the following three-address statements as-suming stack allocation where register SP points to the top of the stack.

x = 1

x = a

x = a + 1

x = a + b

The two statements

Exercise 8 . 3 . 3 : Generate code for the following three-addressstatements

again assuming stack allocation and assuming a and b are arrays whose ele-ments are 4-byte values.

The four-statement sequence

= a [ i ]

= b [ j ]

a [ i ] = y

b [ j ] = x

The three-statement sequence

= a [ i ]

= b [ i ]

= x * y

The three-statement sequence

= a [ i ]

= b[x]

a [ i ] = y

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Compilers : Principles, Techniques, & Tools : Code Generation : Addresses in the Target Code |