Why Interprocedural Analysis?
1 Virtual Method Invocation
2 Pointer Alias Analysis
3 Parallelization
4 Detection of Software Errors
and Vulnerabilities
5 SQL Injection
6 Buffer Overflow
Given how hard interprocedural analysis is, let us now address the
important problem of why and when we wish to use interprocedural analysis.
Although we used constant propagation to illustrate interprocedural analysis,
this inter-procedural optimization is neither readily applicable nor
particularly beneficial when it does occur. Most of the benefits of constant
propagation can be ob-tained simply by performing intraprocedural analysis and
inlining procedure calls of the most frequently executed sections of code.
However, there are many reasons why interprocedural analysis is
essential. Below, we describe several important applications of interprocedural
analysis.
1. Virtual Method Invocation
As mentioned above, object-oriented programs have many small
methods. If we only optimize one method at a time, then there are few
opportunities for optimization. Resolving method invocation enables
optimization. A language like Java dynamically loads its classes. As a result,
we do not know at compile-time to which of (perhaps) many methods named m
a use of "m" refers in an invocation such as x.mQ.
Many Java implementations use a just-in-time compiler to compile
its byte-codes at run time. One common optimization is to profile the execution
and determine which are the common receiver types. We can then inline the
meth-ods that are most frequently invoked. The code includes a dynamic check on
the type and executes the inlined methods if the run-time object has the
expected type.
Another approach to resolving uses of a method name m is possible
as long as all the source code is available at compile time. Then, it is
possible to perform an interprocedural analysis to determine the object types.
If the type for a variable x turns out to be unique, then a use of x.mQ
can be resolved.
We know exactly what method m refers to in this context. In that
case, we can in-line the code for this m, and the compiler does not even have
to include a test for the type of x.
2. Pointer Alias Analysis
Even if we do not wish to perform interprocedural versions of the
common data-flow analyses like reaching definitions, these analyses can in fact
benefit from interprocedural pointer analysis. All the analyses presented in
Chapter 9 apply only to local scalar variables that cannot have aliases.
However, use of pointers is common, especially in languages like C. By knowing
whether pointers can be aliases (can point to the same location), we can
improve the accuracy of the techniques from Chapter 9.
Example 1 2 . 9 : Consider the following sequence of three
statements, which might form a basic block:
Without knowing if p and q can point to the same
location — that is, whether they can be aliases — we cannot conclude that x
is equal to 1 at the end of the block. •
3. Parallelization
As discussed in Chapter 11, the most effective way to parallelize
an application is to find the coarsest granularity of parallelism, such as that
found in the outermost loops of a program. For this task, interprocedural
analysis is of great importance. There is a significant difference between
scalar optimiza-tions (those based on values of simple variables, as discussed
in Chapter 9) and parallelization. In parallelization, just one spurious data
dependence can render an entire loop not parallelizable, and greatly reduce the
effectiveness of the optimization. Such amplification of inaccuracies is not
seen in scalar optimizations. In scalar optimization, we only need to find the
majority of the optimization opportunities. Missing one opportunity or two
seldom makes much of a difference.
4. Detection of Software Errors and Vulnerabilities
Interprocedural analysis is not only important for optimizing
code. The same techniques can be used to analyze existing software for many
kinds of coding errors. These errors can render software unreliable; coding
errors that hackers can exploit to take control of, or otherwise damage, a
computer system can pose significant security vulnerability risks.
Static analysis is useful in detecting occurrences of many common
error patterns. For example, a data item must be guarded by a lock. As another
example, disabling an interrupt in the operating system must be followed by a
re-enabling of the interrupt. Since a significant source of errors is the
incon-sistencies that span procedure boundaries, interprocedural analysis is of
great importance. PREfix and Metal are two practical tools that use
interprocedural analysis effectively to find many programming errors in large
programs. Such tools find errors statically and can improve software
reliability greatly. How-ever, these tools are both incomplete and unsound, in
the sense that they may not find all errors, and not all reported warnings are
real errors. Unfortunately, the interprocedural analysis used is sufficiently
imprecise that, were the tools to report all potential errors, the large number
of false warnings would render the tools unusable. Nevertheless, even though
these tools are not perfect, their systematic use has been shown to greatly
improve software reliability.
When it comes to security vulnerabilities, it is highly desirable
that we find all the potential errors in a program. In 2006, two of the
"most popular" forms of intrusions used by hackers to compromise a
system were
1. Lack of input validation on Web applications: SQL injection is
one of the most popular forms of such vulnerability whereby hackers gain control
of a database by manipulating inputs accepted by web applications.
2. Buffer overflows in C and C + + programs. Because C and C + +
do not check if accesses to arrays are in bounds, hackers can write
well-crafted strings into unintended areas and hence gain control of the
program's execution.
In the next section, we shall discuss how we can use
interprocedural analysis to protect programs against such vulnerabilities.
5. SQL Injection
SQL injection refers to the vulnerability where hackers can manipulate
user input to a Web application and gain unintended access to a database. For
example, banks want their users to be able to make transactions online,
provided they supply their correct password. A common architecture for such a
system is to have the user enter strings into a Web form, and then to have
those strings form part of a database query written in the SQL language. If
systems developers are not careful, the strings provided by the user can alter
the meaning of the SQL statement in unexpected ways.
Example 1 2 . 1 0
: Suppose a bank offers its customers
access to a relation
AcctData(name, password,
balance)
That is, this relation is a table of triples, each consisting of
the name of a customer, the password, and the balance of the account. The
intent is that cus-tomers can see their account balance only if they provide
both their name and their correct password. Having a hacker see an account
balance is not the worst thing that could occur, but this simple example is
typical of more complicated situations where the hacker could execute payments
from the account.
The system might
implement a balance inquiry as follows:
1.
Users invoke a
Web form where they enter their name and password.
2.
The name is
copied to a variable n and the password to a variable p.
3.
Later, perhaps
in some other procedure, the following SQL query is exe-cuted:
SELECT b a l a n c e
FROM AcctData
WHERE name = 5 : n ' and password = ' : p '
For readers not familiar with SQL, this query says: "Find in
the table AcctData a row with the first component (name) equal to the string
currently in variable n and the second component (password) equal to the
string currently in variable p; print the third component (balance) of
that row." Note that SQL uses single
quotes, not double
quotes, to delimit strings, and the colons in front of n and
p indicate that they are variables of the
surrounding language.
Suppose the hacker,
who wants to find Charles Dickens' account balance,
supplies the following values for the strings n and p:
n = Charles Dickens' — p = who
cares
The effect of these strange strings is to convert the query into
SELECT balance FROM
AcctData
WHERE name = '
Charles Dickens ' —' and password
= 'who
cares '
In many database systems — is a comment-introducing token and has
the effect of making whatever follows on that line a comment. As a result, the query now asks the database
system to print the balance for every person whose name is
' C h a r l e s D i c k e n
s ' , regardless of the password that appears with
that name in a name-password-balance triple.
That is, with comments eliminated, the query is:
SELECT b a l a n c
e FROM
AcctData
WHERE name = ' C
h a r l e s Dickens'
•
In Example 12.10, the
"bad" strings were kept in two variables, which might be passed
between procedures. However, in more realistic cases, these strings might be
copied several times, or combined with others to form the full query. We cannot
hope to detect coding errors that create SQL-injection vulnerabilities without
doing a full interprocedural analysis of the entire program.
6. Buffer Overflow
A buffer overflow attack occurs when carefully
crafted data supplied by the user writes beyond the intended buffer and
manipulates the program execution. For example, a C program may read a string s
from the user, and then copy it into a buffer b using the
function call:
strcpy(b,s);
If the string s is actually longer than the buffer b,
then locations that are not part of b will have their values
changed. That in itself will probably cause the program to malfunction or at
least to produce the wrong answer, since some data used by the program will
have been changed.
But worse, the hacker who chose the string s can
pick a value that will do more than cause an error. For example, if the buffer
is on the run-time stack, then it is near the return address for its function.
An insidiously chosen value of s may overwrite the return
address, and when the function returns, it goes to a place chosen by the
hacker. If hackers have detailed knowledge of the surrounding operating system
and hardware, they may be able to execute a command that will give them control
of the machine itself. In some situations, they may even have the ability to
have the false return address transfer control to code that is part of the
string s, thus allowing any sort of program to be inserted into the executing code.
To prevent buffer overflows, every array-write operation must be
statically proven to be within bounds, or a proper array-bounds check must be
performed dynamically. Because these bounds checks need to be inserted by hand
in C and C + + programs, it is easy to forget to insert the test or to get the
test wrong. Heuristic tools have been developed that will check if at least
some test, though not necessarily a correct test, has been performed before a strcpy is called.
Dynamic bounds checking is unavoidable because it is impossible to
deter-mine statically the size of users' input. All a static analysis can do is
assure that the dynamic checks have been inserted properly. Thus, a reasonable
strategy is to have the compiler insert dynamic bounds checking on every write,
and use static analysis as a means to optimize away as many bounds check as
possible. It is no longer necessary to catch every potential violation;
moreover, we only need to optimize only those code regions that execute
frequently.
Inserting bounds checking into C programs is nontrivial, even if we do not mind the cost. A pointer may point into the middle of some array, and we do not know the extent of that array. Techniques have been developed to keep track of the extent of the buffer pointed to by each pointer dynamically. This information allows the compiler to insert array bounds checks for all accesses. Interestingly enough, it is not advisable to halt a program whenever a buffer overflow is detected. In fact, buffer overflows do occur in practice, and a pro-gram would likely fail if we disable all buffer overflows. The solution is to extend the size of the array dynamically to accommodate for the buffer overruns.
Interprocedural analysis can be used to speed up the cost of dynamic ar ray bounds checks. For example, suppose we are interested only in catching buffer overflows involving user-input strings, we can use static analysis to de-termine which variables may hold contents provided by the user. Like SQL injection, being able to track an input as it is copied across procedures is useful in eliminating unnecessary bounds checks.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.