ASSEMBLY
AND LINKING :
Assembly
and linking are the last steps in the compilation process they turn a list of
instructions into an image of the program’s bits in memory. Loading actually
puts the program in memory so that it can be executed. In this section, we
survey the basic techniques required for assembly linking to help us understand
the complete compilation and loading process.
Figure
2.16 highlights the role of assemblers and linkers in the compilation process.
This process is often hidden from us by compilation commands that do everything
required to generate an executable program. As the figure shows, most compilers
do not directly generate machine code, but instead create the instruction-level
program in the form of human-readable assembly language. Generating assembly
language rather than binary instructions frees the compiler writer from details
extraneous to the compilation process, which includes the instruction format as
well as the exact addresses of instructions and data.
The
assembler’s job is to translate symbolic assembly language statements into
bit-level representations of instructions known as object code. The assembler
takes care of instruction formats and does part of the job of translating
labels into addresses. However, since the program may be built from many files,
the final steps in determining the addresses of instructions and data are
performed by the linker, which produces an executable binary file. That file
may not necessarily be located in the CPU’s memory, however, unless the linker
happens to create the executable directly in RAM. The program that brings the
program into memory for execution is called a loader.
The
simplest form of the assembler assumes that the starting address of the
assembly language program has been specified by the programmer. The addresses
in such a program are known as absolute addresses.
Assemblers
When
translating assembly code into object code, the assembler must translate
opcodes and format the bits in each instruction, and translate labels into
addresses. In this section, we review the translation of assembly language into
binary. Labels make the assembly process more complex, but they are the most
important abstraction provided by the assembler. Labels let the programmer (a
human programmer or a compiler generating assembly code) avoid worrying about
the locations of instructions and data. Label processing requires making two
passes through the assembly source code as follows:
1. The first
pass scans the code to determine the address of each label.
2. The
second pass assembles the instructions using the label values computed in the
first pass.
As shown
in Figure 2.17, the name of each symbol and its address is stored in a symbol
table that is built during the first pass. The symbol table is built by
scanning from the first instruction to the last.
During
scanning, the current location in memory is kept in a program location counter (PLC).
Despite the similarity in name to a program counter, the PLC is not used to
execute the program, only to assign memory locations to labels. For example,
the PLC always makes exactly one pass through the program, whereas the program
counter makes many passes over code in a loop. Thus, at the start of the first
pass, the PLC is set to the program’s starting address and the assembler looks
at the first line. After examining the line, the assembler updates the PLC to
the next location (since ARM instructions are four bytes long, the PLC would be
incremented by four) and looks at the next instruction. If the instruction
begins with a label, a new entry is made in the symbol table, which includes
the label name and its value. The value of the label is equal to the current
value of the PLC. At the end of the first pass, the assembler rewinds to the
beginning of the assembly language file to make the second pass. During the
second pass, when a label name is found, the label is looked up in the symbol
table and its value substituted into the appropriate place in the instruction.
But how
do we know the starting value of the PLC? The simplest case is absolute
addressing. In this case, one of the first statements in the assembly language
program is a pseudo-op that specifies the origin of the program, that is, the
location of the first address in the program. A common name for this pseudo-op
(e.g., the one used for the ARM) is the ORG statement.
ORG 2000
Which
puts the start of the program at location 2000. This pseudo-op accomplishes
this by setting the PLC’s value to its argument’s value, 2000 in this case.
Assemblers generally allow a program to have many ORG statements in case
instructions or data must be spread around various spots in memory.
Linking:
Many
assembly language programs are written as several smaller pieces rather than as
a single large file. Breaking a large program into smaller files helps
delineate program modularity. If the program uses library routines, those will
already be preassembled, and assembly language source code for the libraries
may not be available for purchase.
A linker
allows a program to be stitched together out of several smaller pieces. The
linker operates on the object files created by the assembler and modifies the
assembled code to make the necessary links between files.
Some
labels will be both defined and used in the same file. Other labels will be
defined in a single file but used elsewhere as illustrated in Figure 2.18. The
place in the file where a label is defined is known as an entry point. The place in
the file where the label is used is called an external reference.
The main
job of the loader is to resolve
external references based on available entry points. As a result of the need to
know how definitions and references connect, the assembler passes to the linker
not only the object file but also the symbol table.
Even if
the entire symbol table is not kept for later debugging purposes, it must at
least pass the entry points. External references are identified in the object
code by their relative symbol identifiers.
The
linker proceeds in two phases.
1. First, it
determines the address of the start of each object file. The order in which
object files are to be loaded is given by the user, either by specifying
parameters when the loader is run or by creating a load map file that gives
the order in which files are to be placed in memory. Given the order in which
files are to be placed in memory and the length of each object file, it is easy
to compute the starting address of each file.
2. At the start of the second phase, the loader merges all symbol tables from the object files into a single, large table. It then edits the object files to change relative addresses into addresses. This is typically performed by having the assembler write extra bits into the object file to identify the instructions and fields that refer to labels. If a label cannot be found in the merged symbol table, it is undefined and an error message is sent to the user.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.