Chapter: Compilers : Principles, Techniques, & Tools : Interprocedural Analysis

Context-Insensitive Interprocedural Analysis

1 Effects of a Method Invocation 2 Call Graph Discovery in Datalog 3 Dynamic Loading and Reflection 4 Exercises for Section 12.5 945

Context-Insensitive Interprocedural Analysis

1 Effects of a Method Invocation

2 Call Graph Discovery in Datalog

3 Dynamic Loading and Reflection

4 Exercises for Section 12.5 945

We now consider method invocations. We first explain how points-to analysis can be used to compute a precise call graph, which is useful in computing precise points-to results. We then formalize on-the-fly call-graph discovery and show how Datalog can be used to describe the analysis succinctly.

1. Effects of a Method Invocation

The effects of a method call such as x = y . n ( z ) in Java on the points-to rela-tions can be computed as follows:

1. Determine the type of the receiver object, which is the object that y points to. Suppose its type is t. Let m be the method named n in the narrowest superclass of t that has a method named n. Note that, in general, which method is invoked can only be determined dynamically.

2. The formal parameters of m are assigned the objects pointed to by the ac-tual parameters. The actual parameters include not just the parameters passed in directly, but also the receiver object itself. Every method invocation assigns the receiver object to the t h i s variable.3 We refer to the this variables as the Oth formal parameters of methods. In x = y . n ( z ) , there are two formal parameters: the object pointed to by y is assigned to variable t h i s , and the object pointed to by z is assigned to the first declared formal parameter of m.

3. The returned object of m is assigned to the left-hand-side variable of the assignment statement.

In context-insensitive analysis, parameters and returned values are modeled by copy statements. The interesting question that remains is how to determine the type of the receiver object. We can conservatively determine the type ac-cording to the declaration of the variable; for example, if the declared variable has type then only methods named n in subtypes of t can be invoked. Unfortunately, if the declared variable has type Object, then all methods with name n are all potential targets. In real-life programs that use object hierarchies ex-tensively and include many large libraries, such an approach can result in many spurious call targets, making the analysis both slow and imprecise.

We need to know what the variables can point to in order to compute the call targets; but unless we know the call targets, we cannot find out what all the variables can point to. This recursive relationship requires that we discover the call targets on the fly as we compute the points-to set. The analysis continues until no new call targets and no new points-to relations are found.

Example 1 2 . 2 4 : In the code in Fig. 12.26, r is a subtype of s, which itself is a subtype of t. Using only the declared type information, a . n ( ) may invoke any of the three declared methods with name n since s and r are both subtypes of a's declared type, t. Furthermore, it appears that a may point to objects g, h, and i after line (5).

By analyzing the points-to relationships, we first determine that a can point to j, an object of type t. Thus, the method declared in line (1) is a call target. Analyzing line (1), we determine that a also can point to g, an object of type r. Thus, the method declared in line (3) may also be a call target, and a can now also point to i, another object of type r. Since there are no more new call targets, the analysis terminates without analyzing the method declared in line (2) and without concluding that a can point to h.

2. Call Graph Discovery in Datalog

To formulate the Datalog rules for context-insensitive interprocedural analysis, we introduce three EDB predicates, each of which is obtainable easily from the source code:

1. actual(S,I,V) says V is the Jth actual parameter used in call site S.

2. formal(M: I, V) says that V is I t h formal parameter declared in method M.

3. cha(T, N, M) says that M is the method called when N is invoked on a receiver object of type T. (cha stands for class hierarchy analysis).

Each edge of the call graph is represented by an IDB predicate invokes. As we discover more call-graph edges, more points-to relations are created as the parameters are passed in and returned values are passed out. This effect is summarized by the rules shown in Figure 12.27.

The first rule computes the call target of the call site. That is, "5 : V.N (...)" says that there is a call site labeled S that invokes method named N on the receiver object pointed to by V. The subgoals say that if V can point to heap object H, which is allocated as type T, and M is the method used when N is invoked on objects of type T, then call site S may invoke method M.

The second rule says that if site S can call method M, then each formal parameter of M can point to whatever the corresponding actual parameter of the call can point to. The rule for handling returned values is left as an exercise.

Combining these two rules with those explained in Section 12.4 create a context-insensitive points-to analysis that uses a call graph that is computed on the fly. This analysis has the side effect of creating a call graph using a

context-insensitive and flow-insensitive points-to analysis. This call graph is significantly more accurate than one computed based only on type declarations and syntactic analysis.

3. Dynamic Loading and Reflect ion

Languages like Java allow dynamic loading of classes. It is impossible to an-alyze all the possible code executed by a program, and hence impossible to provide any conservative approximation of call graphs or pointer aliases stat-ically. Static analysis can only provide an approximation based on the code analyzed. Remember that all the analyses described here can be applied at the Java bytecode level, and thus it is not necessary to examine the source code. This option is especially significant because Java programs tend to use many libraries.

Even if we assume that all the code to be executed is analyzed, there is one more complication that makes conservative analysis impossible: reflection. Reflection allows a program to determine dynamically the types of objects to be created, the names of methods invoked, as well as the names of the fields accessed. The type, method, and field names can be computed or derived from user input, so in general the only possible approximation is to assume the universe.

The f orName method in the Class library takes a string containing the class name and returns the class. The method newlnstance returns an instance of that class. Instead of leaving the object o with type Object, this object is cast to a superclass T of all the expected classes.

While many large Java applications use reflection, they tend to use common idioms, such as the one shown in Example 12.25. As long as the application does not redefine the class loader, we can tell the class of the object if we know the value of className. If the value of className is defined in the program, because strings are immutable in Java, knowing what className points to will provide the name of the class. This technique is another use of points-to analysis. If the value of className is based on user input, then the points-to analysis can help locate where the value is entered, and the developer may be able to limit the scope of its value.

Similarly, we can exploit the typecast statement, line (4) in Example 12.25, to approximate the type of dynamically created objects. Assuming that the typecast exception handler has not been redefined, the object must belong to a subclass of the class T.

4. Exercises for Section 12.5

E x e r c i s e 1 2 . 5 . 1 : For the code of Fig. 12.26

a) Construct the EDB relations actual, formal, and cha.

b) Make all possible inferences of pts and hpts facts.

Exercise 1 2 . 5 . 2 : How would you add to the EDB predicates and rules of Section 12.5.2 additional predicates and rules to take into account the fact that if a method call returns an object, then the variable to which the result of the call is assigned can point to whatever the variable holding the return value can point to?

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Compilers : Principles, Techniques, & Tools : Interprocedural Analysis : Context-Insensitive Interprocedural Analysis |