The Science of Building a
Compiler design is full of beautiful examples where complicated
real-world problems are solved by abstracting the essence of the problem
mathematically. These serve as excellent illustrations of how abstractions can
be used to solve problems: take a problem, formulate a mathematical
abstraction that captures the key characteristics, and solve it using
mathematical techniques. The problem formulation must be grounded in a solid
understanding of the characteristics of computer programs, and the solution must
be validated and refined empirically.
A compiler must accept all source programs that conform to the
specification of the language; the set of source programs is infinite and any
program can be very large, consisting of possibly millions of lines of code.
Any transformation performed by the compiler while translating a source program
must preserve the meaning of the program being compiled. Compiler writers thus
have influence over not just the compilers they create, but all the programs
that their compilers compile. This leverage makes writing compilers
particularly rewarding; however, it also makes compiler development
Modeling in Compiler Design and
The study of compilers is mainly a study of how we design the right
mathematical models and choose the right algorithms, while balancing the need
for generality and power against simplicity and efficiency.
Some of most fundamental models are finite-state machines and regular
expressions, which we shall meet in Chapter 3. These models are useful for
de-scribing the lexical units of programs (keywords, identifiers, and such) and
for describing the algorithms used by the compiler to recognize those units.
Also among the most fundamental models are context-free grammars, used to de-scribe
the syntactic structure of programming languages such as the nesting of
parentheses or control constructs. We shall study grammars in Chapter 4.
Sim-ilarly, trees are an important model for representing the structure of
programs and their translation into object code, as we shall see in Chapter 5.
The Science of Code Optimization
The term "optimization" in compiler design refers to the
attempts that a com-piler makes to produce code that is more efficient than the
obvious code. "Op-timization" is thus a misnomer, since there is no
way that the code produced by a compiler can be guaranteed to be as fast or
faster than any other code that performs the same task.
In modern times, the optimization of code that a compiler performs has
become both more important and more complex. It is more complex because
processor architectures have become more complex, yielding more opportunities
to improve the way code executes. It is more important because massively
par-allel computers require substantial optimization, or their performance
suffers by orders of magnitude. With the likely prevalence of multicore
machines (com-puters with chips that have large numbers of processors on them),
all compilers will have to face the problem of taking advantage of multiprocessor
It is hard, if not impossible, to build a robust compiler out of
"hacks." Thus, an extensive and useful theory has been built up
around the problem of optimizing code. The use of a rigorous mathematical
foundation allows us to show that an optimization is correct and that it
produces the desirable effect for all possible inputs. We shall see, starting
in Chapter 9, how models such as graphs, matrices, and linear programs are
necessary if the compiler is to produce well optimized code.
On the other hand, pure theory alone is insufficient. Like many
real-world problems, there are no perfect answers. In fact, most of the
questions that we ask in compiler optimization are undecidable. One of the most
important skills in compiler design is the ability to formulate the right
problem to solve. We need a good understanding of the behavior of programs to
start with and thorough experimentation and evaluation to validate our
Compiler optimizations must meet the following design objectives:
The optimization must be correct,
that is, preserve the meaning of the compiled program,
• The optimization must improve
the performance of many programs,
• The compilation time must be
kept reasonable, and
• The engineering effort required
must be manageable.
It is impossible to overemphasize the importance of correctness. It is
trivial to write a compiler that generates fast code if the generated code need
not be correct! Optimizing compilers are so difficult to get right that we dare
say that no optimizing compiler is completely error-free! Thus, the most
important objective in writing a compiler is that it is correct.
The second goal is that the compiler must be effective in improving the
performance of many input programs. Normally, performance means the speed of
the program execution. Especially in embedded applications, we may also wish to
minimize the size of the generated code. And in the case of mobile devices, it
is also desirable that the code minimizes power consumption. Typically, the
same optimizations that speed up execution time also conserve power. Besides
performance, usability aspects such as error reporting and debugging are also
Third, we need to keep the compilation time short to support a rapid
devel-opment and debugging cycle. This requirement has become easier to meet as
machines get faster. Often, a program is first developed and debugged without
program optimizations. Not only is the compilation time reduced, but more
importantly, unoptimized programs are easier to debug, because the
optimiza-tions introduced by a compiler often obscure the relationship between
the source code and the object code. Turning on optimizations in the compiler
sometimes exposes new problems in the source program; thus testing must again
be per-formed on the optimized code. The need for additional testing sometimes
deters the use of optimizations in applications, especially if their
performance is not critical.
Finally, a compiler is a complex system; we must keep the system sim-ple
to assure that the engineering and maintenance costs of the compiler are
manageable. There is an infinite number of program optimizations that we could
implement, and it takes a nontrivial amount of effort to create a correct and
effective optimization. We must prioritize the optimizations, implementing only
those that lead to the greatest benefits on source programs encountered in
Thus, in studying compilers, we learn not only how to build a compiler,
but also the general methodology of solving complex and open-ended problems.
The approach used in compiler development involves both theory and
experimenta-tion. We normally start by formulating the problem based on our
intuitions on what the important issues are.