1 The Sources of Redundancy
2 Can All Redundancy Be Eliminated?
3 The Lazy-Code-Motion Problem
4 Anticipation of Expressions
5 The Lazy-Code-Motion Algorithm
6 Exercises for Section 9.5
In this section, we consider in detail how to minimize the number of expression evaluations. That is, we want to consider all possible execution sequences in a flow graph, and look at the number of times an expression such as x + y is evaluated. By moving around the places where x + y is evaluated and keeping the result in a temporary variable when necessary, we often can reduce the number of evaluations of this expression along many of the execution paths, while not increasing that number along any path. Note that the number of different places in the flow graph where x + y is evaluated may increase, but that is relatively unimportant, as long as the number of evaluations of the expression x + y is reduced.
Applying the code transformation developed here improves the performance of the resulting code, since, as we shall see, an operation is never applied unless it absolutely has to be. Every optimizing compiler implements something like the transformation described here, even if it uses a less "aggressive" algorithm than the one of this section. However, there is another motivation for discussing the problem. Finding the right place or places in the flow graph at which to evaluate each expression requires four different kinds of data-flow analyses. Thus, the study of "partial-redundancy elimination," as minimizing the number of expression evaluations is called, will enhance our understanding of the role data-flow analysis plays in a compiler.
Redundancy in programs exists in several forms. As discussed in Section 9.1.4, it may exist in the form of common subexpressions, where several evalua-tions of the expression produce the same value. It may also exist in the form of a loop-invariant expression that evaluates to the same value in every iteration of the loop. Redundancy may also be partial, if it is found along some of the paths, but not necessarily along all paths. Common subexpressions and loop-invariant expressions can be viewed as special cases of partial redundancy; thus a single partial-redundancy-elimination algorithm can be devised to eliminate all the various forms of redundancy.
In the following, we first discuss the different forms of redundancy, in order to build up our intuition about the problem. We then describe the generalized redundancy-elimination problem, and finally we present the algorithm. This algorithm is particularly interesting, because it involves solving multiple data-flow problems, in both the forward and backward directions.
1. The Sources of Redundancy
Fi gure 9.30 illustrates the three forms of redundancy: common subexpressions, loop-invariant expressions, and partially redundant expressions. The figure shows the code both before and after each optimization.
Figure 9.30: Examples of (a) global common subexpression, (b) loop-invariant code motion, (c) partial-redundancy elimination.
Global Common Subexpressions
In Fig. 9.30(a), the expression b + c computed in block B4 is redundant; it has already been evaluated by the time the flow of control reaches B4 regardless of the path tsEken to get there. As we observe in this example, the value of the expression may be different on different paths. We can optimize the code by storing the result of the computations of b + c in blocks B2 and B3 in the same temporary variable, say t, and then assigning the value of t to the variable e in block B4, instead of reevaluating the expression. Had there been an assignment to either b or c after the last computation of 6 + c but before block B4, the expression in block B4 would not be redundant.
Formally, we say that an expression b + c is (fully) redundant at point p, if it is an available expression, in the sense of Section 9.2.6, at that point. That is, the expression b + c has been computed along all paths reaching p, and the variables b and c were not redefined after the last expression was evaluated. The latter condition is necessary, because even though the expression b + c is textually executed before reaching the point p, the value of b + c computed at
Finding "Deep" Common Subexpressions
Using available-expressions analysis to identify redundant expressions only works for expressions that are textually identical. For example, an appli-cation of common-subexpression elimination will recognize that tl in the code fragment
as long as the variables b and c have not been redefined in between. It does not, however, recognize that a and e are also the same. It is possi-ble to find such "deep" common subexpressions by re-applying common subexpression elimination until no new common subexpressions are found on one round. It is also possible to use the framework of Exercise 9.4.2 to catch deep common subexpressions.
point p would have been different, because the operands might have changed.
Loop - Invariant Expressions
Fig. 9.30(b) shows an example of a loop-invariant expression. The expression b + c is loop invariant assuming neither the variable b nor c is redefined within the loop. We can optimize the program by replacing all the re-executions in a loop by a single calculation outside the loop. We assign the computation to a temporary variable, say t, and then replace the expression in the loop by t. There is one more point we need to consider when performing "code motion" optimizations such as this. We should not execute any instruction that would not have executed without the optimization. For example, if it is possible to exit the loop without executing the loop-invariant instruction at all, then we should not move the instruction out of the loop. There are two reasons.
If the instruction raises an exception, then executing it may throw an exception that would not have happened in the original program.
When the loop exits early, the "optimized" program takes more time than the original program.
To ensure that loop-invariant expressions in while-loops can be optimized, compilers typically represent the statement
In this way, loop-invariant expressions can be placed just prior to the repeat-until construct.
Unlike common-subexpression elimination, where a redundant expression computation is simply dropped, loop-invariant-expression elimination requires an expression from inside the loop to move outside the loop. Thus, this opti-mization is generally known as "loop-invariant code motion." Loop-invariant code motion may need to be repeated, because once a variable is determined to to have a loop-invariant value, expressions using that variable may also become loop-invariant.
Partially Redundant Expressions
An example of a partially redundant expression is shown in Fig. 9.30(c). The expression b + c in block B4 is redundant on the path B1 -» B2 —> B4, but not on the path B1 ->• B3 ->• B4. We can eliminate the redundancy on the former path by placing a computation of b + c in block B3. All the results of b + c are written into a temporary variable t, and the calculation in block B4 is replaced with t. Thus, like loop-invariant code motion, partial-redundancy elimination requires the placement of new expression computations.
2. Can All Redundancy Be Eliminated?
Is it possible to eliminate all redundant computations along every path? The answer is "no," unless we are allowed to change the flow graph by creating new blocks.
E x a m p l e 9 . 2 8 : In the example shown in Fig. 9.31(a), the expression of b + c is computed redundantly in block B4 if the program follows the execution path Bi B2 -> B4. However, we cannot simply move the computation of b + c to block B3, because doing so would create an extra computation of 6 + c when the path B1 -> B3 B5 is taken.
What we would like to do is to insert the computation of b + c only along the edge from block B3 to block B4. We can do so by placing the instruction in a new block, say, BQ, and making the flow of control from B3 go through B6 before it reaches B4. The transformation is shown in Fig. 9.31(b).
We define a critical edge of a flow graph to be any edge leading from a node with more than one successor to a node with more than one predecessor. By introducing new blocks along critical edges, we can always find a block to accommodate the desired expression placement. For instance, the edge from B3 to B4 in Fig. 9.31(a) is critical, because B3 has two successors, and B4 has two predecessors.
Adding blocks may not be sufficient to allow the elimination of all redundant computations. As shown in Example 9.29, we may need to duplicate code so as to isolate the path where redundancy is found.
Example 9 . 2 9 : In the example shown in Figure 9.32(a), the expression of b+c is computed redundantly along the path B1 -+ B2 -+ B4 —> BQ . We would like to remove the redundant computation of b + c from block BQ in this path and compute the expression only along the path B1 -+ B3 -+ B4 -+ BQ. However, there is no single program point or edge in the source program that corresponds uniquely to the latter path. To create such a program point, we can duplicate the pair of blocks B4 and BQ , with one pair reached through B2 and the other reached through B3: as shown in Figure 9.32(b). The result of b + c is saved in variable t in block B2, and moved to variable d in B'Q, the copy of BQ reached from B2. •
Since the number of paths is exponential in the number of conditional branches in the program, eliminating all redundant expressions can greatly increase the size of the optimized code. We therefore restrict our discussion of redundancy-elimination techniques to those that may introduce additional blocks but that do not duplicate portions of the control flow graph.
3. The Lazy-Code-Motion Problem
It is desirable for programs optimized with a partial-redundancy-elimination algorithm to have the following properties:
All redundant computations of expressions that can be eliminated without code duplication are eliminated.
The optimized program does not perform any computation that is not in the original program execution.
3. Expressions are computed at the latest possible time.
The last property is important because the values of expressions found to be redundant are usually held in registers until they are used. Computing a value as late as possible minimizes its lifetime — the duration between the time the value is defined and the time it is last used, which in turn minimizes its usage of a register. We refer to the optimization of eliminating partial redundancy with the goal of delaying the computations as much as possible as lazy code motion.
To build up our intuition of the problem, we first discuss how to reason about partial redundancy of a single expression along a single path. For convenience, we assume for the rest of the discussion that every statement is a basic block of its own.
An expression e in block B is redundant if along all paths reaching B, e has been evaluated and the operands of e have not been redefined subsequently. Let S be the set of blocks, each containing expression e, that renders e in B redundant. The set of edges leaving the blocks in S must necessarily form a cutset, which if removed, disconnects block B from the entry of the program. Moreover, no operands of e are redefined along the paths that lead from the blocks in S to B.
If an expression e in block B is only partially redundant, the lazy-code-motion algorithm attempts to render e fully redundant in B by placing additional copies of the expressions in the flow graph. If the attempt is successful, the optimized flow graph will also have a set of basic blocks 5, each containing expression e, and whose outgoing edges are a cutset between the entry and B. Like the fully redundant case, no operands of e are redefined along the paths that lead from the blocks in S to B.
4. Anticipation of Expressions
There is an additional constraint imposed on inserted expressions to ensure that no extra operations are executed. Copies of an expression must be placed only at program points where the expression is anticipated. We say that an expression b + c is anticipated at point p if all paths leading from the point p eventually compute the value of the expression b + c from the values of b and c that are available at that point.
Let us now examine what it takes to eliminate partial redundancy along an acyclic path Bi -+ B2 -»...-» Bn. Suppose expression e is evaluated only in blocks B\ and Bn, and that the operands of e are not redefined in blocks along the path. There are incoming edges that join the path and there are outgoing edges that exit the path. We see that e is not anticipated at the entry of block Bi if and only if there exists an outgoing edge leaving block Bj, i < j < n, that leads to an execution path that does not use the value of e. Thus, anticipation limits how early an expression can be inserted.
We can create a cutset that includes the edge B^i -+ Bi and that renders e redundant in Bn if e is either available or anticipated at the entry of Bi. If e is anticipated but not available at the entry of Bi, we must place a copy of the expression e along the incoming edge.
We have a choice of where to place the copies of the expression, since there are usually several cutsets in the flow graph that satisfy all the requirements. In the above, computation is introduced along the incoming edges to the path of interest and so the expression is computed as close to the use as possible, without introducing redundancy. Note that these introduced operations may themselves be partially redundant with other instances of the same expression in the program. Such partial redundancy may be eliminated by moving these computations further up.
In summary, anticipation of expressions limits how early an expression can be placed; you cannot place an expression so early that it is not anticipated where you place it. The earlier an expression is placed, the more redundancy can be removed, and among all solutions that eliminate the same redundancies, the one that computes the expressions the latest minimizes the lifetimes of the registers holding the values of the expressions involved.
5. The Lazy-Code-Motion Algorithm
This discussion thus motivates a four-step algorithm. The first step uses an-ticipation to determine where expressions can be placed; the second step finds the earliest cutset, among those that eliminate as many redundant operations as possible without duplicating code and without introducing any unwanted computations. This step places the computations at program points where the values of their results are first anticipated. The third step then pushes the cutset down to the point where any further delay would alter the semantics of the program or introduce redundancy. The fourth and final step is a simple pass to clean up the code by removing assignments to temporary variables that are used only once. Each step is accomplished with a data-flow pass: the first and fourth are backward-flow problems, the second and third are forward-flow problems.
Find all the expressions anticipated at each program point using a back-ward data-flow pass.
The second step places the computation where the values of the expres-sions are first anticipated along some path. After we have placed copies of an expression where the expression is first anticipated, the expression would be available at program point p if it has been anticipated along all paths reaching p. Availability can be solved using a forward data-flow pass. If we wish to place the expressions at the earliest possible posi-tions, we can simply find those program points where the expressions are anticipated but are not available.
Executing an expression as soon as it is anticipated may produce a value long before it is used. An expression is postponable at a program point if the expression has been anticipated and has yet to be used along any path reaching the program point. Postponable expressions are found using a forward data-flow pass. We place expressions at those program points where they can no longer be postponed.
A simple, final backward data-flow pass is used to eliminate assignments to temporary variables that are used only once in the program.
We now present the full lazy-code-motion algorithm. To keep the algorithm simple, we assume that initially every statement is in a basic block of its own, and we only introduce new computations of expressions at the beginnings of blocks. To ensure that this simplification does not reduce the effectiveness of the technique, we insert a new block between the source and the destination of an edge if the destination has more than one predecessor. Doing so obviously also takes care of all critical edges in the program.
We abstract the semantics of each block B with two sets: e-uses is the set of expressions computed in B and e-kills is the set of expressions killed, that is, the set of expressions any of whose operands are defined in B. Example 9.30 will be used throughout the discussion of the four data-flow analyses whose definitions are summarized in Fig. 9.34.
Example 9 . 3 0 : In the flow graph in Fig. 9.33(a), the expression b + c appears three times. Because the block B9 is part of a loop, the expression may be computed many times. The computation in block BQ is not only loop invariant; it is also a redundant expression, since its value already has been used in block Bj. For this example, we need to compute b+c only twice, once in block B5 and once along the path after B2 and before B7. The lazy code motion algorithm will place the expression computations at the beginning of blocks B4 and B5.
Recall that an expression b + c is anticipated at a program point p if all paths leading from point p eventually compute the value of the expression b + c from the values of b and c that are available at that point.
In Fig. 9.33(a), all the blocks anticipating b + c on entry are shown as lightly shaded boxes. The expression b + c is anticipated in blocks £3, B4, . B 5 , B6, BJ, and BQ. It is not anticipated on entry to block B2, because the value of c is recomputed within the block, and therefore the value of b + c that would be computed at the beginning of B2 is not used along any path. The expression b+c is not anticipated on entry to B1, because it is unnecessary along the branch from B1 to B2 (although it would be used along the path B1 —> B5 -> B6). Similarly, the expression is not anticipated at the beginning of B8, because of the branch from B8 to Bn. The anticipation of an expression may oscillate along a path, as illustrated by £7 —> Bg —> BQ.
The data-flow equations for the anticipated-expressions problem are shown in Fig 9.34(a). The analysis is a backward pass. An anticipated expression at the exit of a block B is an anticipated expression on entry only if it is not in the e-kills set. Also a block B generates as new uses the set of ejuses expressions. At the exit of the program, none of the expressions are anticipated. Since we are interested in finding expressions that are anticipated along every subsequent
path, the meet operator is set intersection. Consequently, the interior points must be initialized to the universal set U, as was discussed for the available-expressions problem in Section 9.2.6.
At the end of this second step, copies of an expression will be placed at program points where the expression is first anticipated. If that is the case, an expression will be available at program point p if it is anticipated along all paths reaching p. This problem is similar to available-expressions described in Section 9.2.6. The transfer function used here is slightly different though. An expression is available on exit from a block if it is
Completing the Square
Anticipated expressions (also called "very busy expressions" elsewhere) is a type of data-flow analysis we have not seen previously. While we have seen backwards-flowing frameworks such as live-variable analysis (Sect. 9.2.5), and we have seen frameworks where the meet is intersection such as avail-able expressions (Sect. 9.2.6), this is the first example of a useful analysis that has both properties. Almost all analyses we use can be placed in one of four groups, depending on whether they flow forwards or backwards, and depending on whether they use union or intersection for the meet. Notice also that the union analyses always involve asking about whether there exists a path along which something is true, while the intersection analyses ask whether something is true along all paths.
(a) Available, or
(b) In the set of anticipated expressions upon entry (i.e., it could be made available if we chose to compute it here), and
2. Not killed in the block.
The data-flow equations for available expressions are shown in Fig 9.34(b). To avoid confusing the meaning of IN, we refer to the result of an earlier analysis by appending "[JBj.m" to the name of the earlier analysis.
With the earliest placement strategy, the set of expressions placed at block B, i.e., earliest[B], is defined as the set of anticipated expressions that are not yet available. That is,
earliest[B] — anticipated[B].in — available[B].in.
Example 9 . 3 1 : The expression b + c in the flow graph in Figure 9.35 is not anticipated at the entry of block B3 but is anticipated at the entry of block B4. It is, however, not necessary to compute the expression 6 + c in block B4, because the expression is already available due to block B2.
Example 9 . 3 2 : Shown with dark shadows in Fig. 9.33(a) are the blocks for which expression b + c is not available; they are Bi,B2,B3, and B5. The early-placement positions are represented by the lightly shaded boxes with dark shadows, and are thus blocks B3 and B5 . Note, for instance, that b + c is considered available on entry to B4, because there is a path B1 -» B2 -> B3 —»• B4 along which b + c is anticipated at least once — at B3 in this case — and since the beginning of B3, neither b nor c was recomputed.
The third step postpones the computation of expressions as much as possible while preserving the original program semantics and minimizing redundancy. Example 9.33 illustrates the importance of this step.
Example 9 . 3 3 : In the flow graph shown in Figure 9.36, the expression b + c is computed twice along the path Bi -+ B5 -» BQ -» B7. The expression b + c is anticipated even at the beginning of block B1. If we compute the expression as soon as it is anticipated, we would have computed the expression b + c in B1. The result would have to be saved from the beginning, through the execution of the loop comprising blocks B2 and £3, until it is used in block B7. Instead we can delay the computation of expression b + c until the beginning of B5 and until the flow of control is about to transition from B4 to Bq. •
Formally, an expression x + y is postponable to a program point p if an early placement of x + y is encountered along every path from the entry node to p, and there is no subsequent use of x + y after the last such placement.
Example 9 . 3 4 : Let us again consider expression b + c in Fig. 9.33. The two earliest points for b + c are B3 and B5; note that these are the two blocks that are both lightly and darkly shaded in Fig. 9.33(a), indicating that b + c is both anticipated and not available for these blocks, and only these blocks. We cannot postpone b + c from B5 to BQ , because b + c is used in B5. We can postpone it from B3 to B4, however.
But we cannot postpone b + c from B4 to B7. The reason is that, although b + c is not used in B4, placing its computation at B7 instead would lead to a
redundant computation of b + c along the path B§ -» BQ -» B7. As we shall see, B4 is one of the latest places we can compute b + c.
The data-flow equations for the postponable-expressions problem are shown in Fig 9.34(c). The analysis is a forward pass. We cannot "postpone" an expression to the entry of the program, so OUT [ ENTRY ] = 0. An expression is postponable to the exit of block B if it is not used in the block, and either it is postponable to the entry of B or it is in earliest[B]. An expression is not postponable to the entry of a block unless all its predecessors include the expression in their postponable sets at their exits. Thus, the meet operator is set intersection, and the interior points must be initialized to the top element of the semilattice — the universal set.
Roughly speaking, an expression is placed at the frontier where an expression transitions from being postponable to not being postponable. More specifically, an expression e may be placed at the beginning of a block B only if the expres-sion is in B's earliest or postponable set upon entry. In addition, B is in the postponement frontier of e if one of the following holds:
1. e is not in postponable[B].out. In other words, e is in ejuses-
e cannot be postponed to
one of its successors. In other words, there exists a successor of B such that e is not in the earliest or postponable set upon entry to that successor.
Expression e can be placed at the front of block B in either of the above scenarios because of the new blocks introduced by the preprocessing step in the algorithm.
E x a m p l e 9 . 3 5 : Fig. 9.33(b) shows the result of the analysis. The light-shaded boxes represent the blocks whose earliest set includes b + c. The dark shadows indicate those that include b + c in their postponable set. The latest placements of the expressions are thus the entries of blocks B4 and B5, since
1. b + c is in the postponable set of _B4 but not B7, and
2. I V s earliest set includes b + c and it uses b + c.
The expression is stored into the temporary variable t in blocks B4 and B5, and t is used in place of b + c everywhere else, as shown in the figure.
U s e d Expressions
Finally, a backward pass is used to determine if the temporary variables in-troduced are used beyond the block they are in. We say that an expression is used at point p if there exists a path leading from p that uses the expression before the value is reevaluated. This analysis is essentially liveness analysis (for expressions, rather than for variables).
The data-flow equations for the used expressions problem are shown in Fig 9.34(d). The analysis is a backward pass. A used expression at the exit of a block B is a used expression on entry only if it is not in the latest set.
A block generates, as new uses, the set of expressions in ejuse^ . At the exit of the program, none of the expressions are used. Since we are interested in finding expressions that are used by any subsequent path, the meet operator is set union. Thus, the interior points must be initialized with the top element of the semilattice — the empty set.
Putting it All Together
All the steps of the algorithm are summarized in Algorithm 9.36.
Algorithm 9.36 : Lazy code motion.
INPUT: A flow graph for which ejuses and eJtillB have been computed for each block B.
OUTPUT: A modified flow graph satisfying the four lazy code motion conditions in Section 9.5.3.
Insert an empty block along all edges entering a block with more than one predecessor.
2. Find anticipated[B].in for all blocks B, as defined in Fig. 9.34(a).
3. Find available[B].in for all blocks B as defined in Fig. 9.34(b).
4. Compute the earliest placements for all blocks B:
earliest[B] = anticipated[B].in - available[B].in
5. Find postponable[B).in for all blocks B as defined in Fig. 9.34(c).
6. Compute the latest placements for all blocks B:
Note that -i denotes complementation with respect to the set of all ex-pressions computed by the program.
7. Find used[B].out for all blocks B, as defined in Fig. 9.34(d).
8. For each expression, say x+y, computed by the program, do the following:
(a) Create a new temporary, say t, for x + y.
(b) For all blocks B such that x + y is in latest[B] n used[B].out, add
t = x+y at the beginning of B.
For all blocks B such that x + y is in
ejuses H (->latest[B] U used.out[B])
replace every original x + y by t.
S u m m a r y
Partial-redundancy elimination finds many different forms of redundant opera-tions in one unified algorithm. This algorithm illustrates how multiple data-flow problems can be used to find optimal expression placement.
The placement constraints are provided by the anticipated-expressions analysis, which is a backwards data-flow analysis with a set-intersection meet operator, as it determines if expressions are used subsequent to each program point on all paths.
The earliest placement of an expression is given by program points where the expression is anticipated but is not available. Available expressions are found with a forwards data-flow analysis with a set-intersection meet operator that computes if an expression has been anticipated before each program point along all paths.
The latest placement of an expression is given by program points where an expression can no longer be postponed. Expressions are postponable at a program point if for all paths reaching the program point, no use of the expression has been encountered. Postponable expressions are found
with a forwards data-flow analysis with a set-intersection meet operator.
Temporary assignments are eliminated unless they are used by some path subsequently. We find used expressions with a backwards data-flow anal-ysis, this time with a set-union meet operator.
6. Exercises for Section 9.5
Exercise 9.5.1: For the flow graph in Fig. 9.37:
a) Compute anticipated for the beginning and end of each block.
b) Compute available for the beginning and end of each block.
Compute earliest for each block.
Compute postponable for the beginning and end of each block.
e) Compute used for the beginning and end of each block.
f) Compute latest for each block.
Introduce temporary variable t; show where it is computed and where it is used.
Exercise 9.5.2: Repeat Exercise 9.5.1 for the flow graph of Fig. 9.10 (see the exercises to Section 9.1). You may limit your analysis to the expressions a + 6, c — a, and b * d.
Exercise 9.5.3: The concepts discussed in this section can also be applied to eliminate partially dead code. A definition of a variable is partially dead if the variable is live on some paths and not others. We can optimize the program execution by only performing the definition along paths where the variable is live. Unlike partial-redundancy elimination, where expressions are moved before the original, the new definitions are placed after the original. Develop an algorithm to move partially dead code, so expressions are evaluated only where they will eventually be used.