Context-Insensitive Interprocedural Analysis
1 Effects of a Method Invocation
2 Call Graph Discovery in Datalog
3 Dynamic Loading and Reflection
4 Exercises for Section 12.5 945
We now consider method invocations. We first explain how points-to
analysis can be used to compute a precise call graph, which is useful in
computing precise points-to results. We then formalize on-the-fly call-graph
discovery and show how Datalog can be used to describe the analysis succinctly.
1. Effects of a Method Invocation
The effects of a
method call such as x = y . n ( z ) in Java on the points-to rela-tions can be
computed as follows:
1. Determine the type of the receiver object,
which is the object that y points to. Suppose its type is t. Let m
be the method named n in the narrowest superclass of t that has a method
named n. Note that, in general, which method is invoked can only be
2. The formal parameters of m are assigned the
objects pointed to by the ac-tual parameters. The actual parameters include not
just the parameters passed in directly, but also the receiver object itself.
Every method invocation assigns the receiver object to the t h i s variable.3 We refer to the this variables as the Oth
formal parameters of methods. In x = y . n ( z ) , there are two formal
parameters: the object pointed to by y is assigned to variable t h i s , and
the object pointed to by z is assigned to the first declared formal parameter
3. The returned
object of m is assigned to the left-hand-side variable of the assignment
context-insensitive analysis, parameters and returned values are modeled by
copy statements. The interesting question that remains is how to determine the
type of the receiver object. We can conservatively determine the type
ac-cording to the declaration of the variable; for example, if the declared
variable has type then only methods named n in subtypes of t can be invoked.
Unfortunately, if the declared variable has type Object, then all methods with
name n are all potential targets. In real-life programs that use object
hierarchies ex-tensively and include many large libraries, such an approach can
result in many spurious call targets, making the analysis both slow and
We need to know what
the variables can point to in order to compute the call targets; but unless we
know the call targets, we cannot find out what all the variables can point to.
This recursive relationship requires that we discover the call targets on the
fly as we compute the points-to set. The analysis continues until no new call
targets and no new points-to relations are found.
Example 1 2 . 2 4 : In the code in Fig. 12.26, r
is a subtype of s, which itself is a subtype of t. Using only the declared type
information, a . n ( ) may invoke any of the three declared methods with name n
since s and r are both subtypes of a's declared type, t. Furthermore, it
appears that a may point to objects g, h, and i after line (5).
By analyzing the points-to relationships, we first
determine that a can point to j, an object of type t. Thus, the method declared
in line (1) is a call target. Analyzing line (1), we determine that a also can
point to g, an object of type r. Thus, the method declared in line (3) may also
be a call target, and a can now also point to i, another object of type r.
Since there are no more new call targets, the analysis terminates without
analyzing the method declared in line (2) and without concluding that a can
point to h.
2. Call Graph Discovery in Datalog
To formulate the
Datalog rules for context-insensitive interprocedural analysis, we introduce
three EDB predicates, each of which is obtainable easily from the source code:
1. actual(S,I,V) says V
is the Jth actual parameter used in call site
2. formal(M: I, V)
says that V is I t h formal
parameter declared in method
cha(T, N, M) says that M is the method called when N is
invoked on a receiver object of type T. (cha stands for class
Each edge of the call graph is represented by an IDB predicate invokes.
As we discover more call-graph edges, more points-to relations are created as
the parameters are passed in and returned values are passed out. This effect is
summarized by the rules shown in Figure 12.27.
The first rule computes the call target of the call site. That is,
"5 : V.N (...)" says that there is a call site labeled S
that invokes method named N on the receiver object pointed to by V.
The subgoals say that if V can point to heap object H, which is
allocated as type T, and M is the method used when N is invoked
on objects of type T, then call site S may invoke method M.
The second rule says that if site S can call method M,
then each formal parameter of M can point to whatever the corresponding
actual parameter of the call can point to. The rule for handling returned
values is left as an exercise.
Combining these two
rules with those explained in Section 12.4 create a context-insensitive points-to
analysis that uses a call graph that is computed on the fly. This analysis has
the side effect of creating a call graph using a
context-insensitive and flow-insensitive points-to analysis. This
call graph is significantly more accurate than one computed based only on type
declarations and syntactic analysis.
3. Dynamic Loading
and Reflect ion
Languages like Java allow dynamic loading of classes. It is
impossible to an-alyze all the possible code executed by a program, and hence
impossible to provide any conservative approximation of call graphs or pointer
aliases stat-ically. Static analysis can only provide an approximation based on
the code analyzed. Remember that all the analyses described here can be applied
at the Java bytecode level, and thus it is not necessary to examine the source
code. This option is especially significant because Java programs tend to use
Even if we assume
that all the code to be executed is analyzed, there is one more complication
that makes conservative analysis impossible: reflection. Reflection allows a
program to determine dynamically the types of objects to be created, the names
of methods invoked, as well as the names of the fields accessed. The type,
method, and field names can be computed or derived from user input, so in
general the only possible approximation is to assume the universe.
The f orName method in the Class library takes a string containing the class name
and returns the class. The method newlnstance returns an instance of that class. Instead of
leaving the object o with type Object, this object is cast to a superclass T of
all the expected classes.
While many large Java applications use reflection, they tend to
use common idioms, such as the one shown in Example 12.25. As long as the
application does not redefine the class loader, we can tell the class of the
object if we know the value of className. If the value of className is defined
in the program, because strings are immutable in Java, knowing what className
points to will provide the name of the class. This technique is another use of
points-to analysis. If the value of className is based on user input, then the
points-to analysis can help locate where the value is entered, and the
developer may be able to limit the scope of its value.
Similarly, we can exploit the typecast statement, line (4) in
Example 12.25, to approximate the type of dynamically created objects. Assuming
that the typecast exception handler has not been redefined, the object must
belong to a subclass of the class T.
Exercises for Section 12.5
E x e r c i s e 1
2 . 5 . 1 : For the code of Fig.
a) Construct the EDB
relations actual, formal, and cha.
b) Make all possible
inferences of pts and hpts facts.
Exercise 1 2 . 5 . 2
: How would you add to the EDB predicates and rules of Section 12.5.2
additional predicates and rules to take into account the fact that if a method
call returns an object, then the variable to which the result of the call is
assigned can point to whatever the variable holding the return value can point