Home | | **Compiler Design** | | **Compilers Principles, Techniques, & Tools** | | **Compiler Design** | Type Checking

1 Rules for Type Checking
2 Type Conversions
3 Overloading of Functions and Operators
4 Type Inference and Polymorphic Functions
5 An Algorithm for Unification
6 Exercises for Section 6.5

**Type Checking**

*1 Rules for Type Checking*

* 2 Type Conversions*

* 3 Overloading of Functions and Operators*

* 4 Type Inference and Polymorphic Functions*

* 5 An Algorithm for Unification*

* 6 Exercises for Section 6.5*

To do *type
checking* a compiler needs to assign a type expression to each com-ponent of
the source program. The compiler must then determine that these type
expressions conform to a collection of logical rules that is called the *type* *system
*for the source language.

Type checking has the potential for catching errors in programs. In
principle, any check can be done dynamically, if the target code carries the
type of an element along with the value of the element. A *sound* type system eliminates the need for dynamic checking for type
errors, because it allows us to determine statically that these errors cannot
occur when the target program runs. An implementation of a language is *strongly typed* if a compiler guarantees
that the programs it accepts will run without type errors.

Besides their use for compiling,
ideas from type checking have been used to improve the security of systems that
allow software modules to be imported and executed. Java programs compile into
machine-independent bytecodes that include detailed type information about the
operations in the bytecodes. Im-ported code is checked before it is allowed to
execute, to guard against both inadvertent errors and malicious misbehavior.

**1. Rules for Type Checking**

Type checking can take on two
forms: synthesis and inference. *Type
synthesis* builds up the type of an expression from the types of its
subexpressions. It requires names to be declared before they are used. The type
of *E1 + E** _{2}* is defined in terms of the types of

Here, f and x denote expressions, and s ->• t denotes a function from
s to t. This rule for functions with one argument carries over to functions
with several arguments. The rule (6.8) can be adapted for E1 +E2 by viewing it as a function application add(Ei,E2).Q

Type inference determines the type of a language construct from the way
it is used. Looking ahead to the
examples in Section 6.5.4, let null be a function that tests whether a list is empty. Then, from
the usage null(x), we can tell
that x must be a list. The type of the
elements of x is not known; all we know is that x must be a list of
elements of some type that is presently unknown. Variables representing type
expressions allow us to talk about unknown types. We shall use Greek letters a,
(3, • • • for type variables in
type expressions.

A typical rule for type inference has the form

Type inference is needed for languages like ML, which check types, but
do not require names to be declared.

We shall use the term "synthesis" even if some context information is used to determine types. With overloaded functions, where the same name is given to more than one function, the context of E1 + E2 may also need to be considered in some languages.

In this section, we consider type
checking of expressions. The rules for checking statements are similar to those
for expressions. For example, we treat the conditional statement "if *(E)* S;" as if it were the
application of a function *if *to* E *and* S. *Let the special type* void
*denote the absence of a value. Then* *function
*if* expects to be applied to a *boolean* and a *void;* the result of the application is a *void.*

**2. Type Conversions**

Consider expressions like *x + i,* where *x* is of type float and *i*
is of type inte-ger. Since the representation of integers and floating-point
numbers is different within a computer and different machine instructions are
used for operations on integers and floats, the compiler may need to convert
one of the operands of + to ensure that both operands are of the same type when
the addition occurs.

Suppose that integers are converted to floats when necessary, using a
unary operator ( f l o a t ) . For example, the integer **2** is
converted to a float in the code for the expression **2*3.14:**

We can extend such examples to consider integer and float versions of
the operators; for example, i n t * for integer operands and float * for
floats.

Type synthesis will be illustrated by extending the scheme in Section
6.4.2 for translating expressions. We introduce another attribute E.type, whose
value

is either integer or float. The rule associated with E -- > E1 + E2 builds on
the pseudocode

As the number of types subject to conversion increases, the number of
cases increases rapidly. Therefore with large numbers of types, careful
organization of the semantic actions becomes important.

Type conversion rules vary from language to
language. The rules for Java in Fig. 6.25 distinguish between *widening* conversions, which are intended
to preserve information, and *narrowing*
conversions, which can lose information. The widening rules are given by the
hierarchy in Fig. 6.25(a): any type lower in the hierarchy can be widened to a
higher type. Thus, a *char* can be widened
to an *int* or to a *float,* but a *char* cannot be widened to a *short.*
The narrowing rules are illustrated by the graph in Fig. 6.25(b): a type *s* can be narrowed to a type *t* if there is a path from *s* to *t.*
Note that *char, short,* and *byte* are pairwise convertible to each
other.

Conversion from one type to another is said to be *implicit* if it is done automatically by the compiler. Implicit type
conversions, also called *coercions,*

are limited in many languages to widening conversions. Conversion is
said to be explicit if the programmer must write something to cause the
conversion. Explicit conversions are also called casts.

The semantic action for checking E -» E1 + E2 uses two functions:

max(t1,t2) takes two
types ti and t2 and returns the maximum (or least upper bound) of the two types
in the widening hierarchy. It declares an error if either t1 or t2 is not in the hierarchy; e.g., if either type is an array or a
pointer type.

2. widen(a, t, w) generates
type conversions if needed
to widen an address a of type t
into a value of type w. It returns a itself if t and w are the same type.
Otherwise, it generates an instruction to do the conversion and place the
result in a temporary t, which is returned as the result. Pseudocode for widen,
assuming that the only types are integer and float, appears in Fig. 6.26.

The semantic action for E -» E1
+ E2 in Fig.
6.27 illustrates how type
conversions can be added to the scheme in Fig. 6.20 for translating
expressions.

In the semantic action, temporary variable a1 is either Ei.addr, if the type of Ei does not
need to be converted to the type of E, or a new temporary variable returned by
widen if this conversion is necessary. Similarly, a2 is either E2.addr or a new temporary holding
the type-converted value of E2. Neither conversion is needed if both types are
integer or both are float. In general, however, we could find that the only way
to add values of two different types is to convert them both to a third type.

**3. Overloading of Functions and
Operators**

An *overloaded* symbol has
different meanings depending on its context. Over-loading is *resolved* when a unique meaning is
determined for each occurrence of a name. In this section, we restrict
attention to overloading that can be resolved by looking only at the arguments
of a function, as in Java.

Example 6.13 : The + operator in Java denotes either string
concatenation or addition, depending on the types of its operands. User-defined
functions can be overloaded as well, as in

Note that we can choose between these two versions of a function e r r
by looking at their arguments.

The following is a type-synthesis rule for overloaded functions:

The value-number method of Section 6.1.2 can be applied to type
expressions to resolve overloading based on argument types, efficiently. In a
DAG representing a type expression, we assign an integer index, called a value
num-ber, to each node. Using Algorithm 6.3, we construct a signature for a
node, consisting of its label and the value numbers of its children, in order
from left to right. The signature for a function consists of the function name
and the types of its arguments. The assumption that we can resolve overloading
based on the types of arguments is equivalent to saying that we can resolve
overloading based on signatures.

It is not always possible to resolve overloading by
looking only at the argu-ments of a function. In Ada, instead of a single type,
a subexpression standing alone may have a set of possible types for which the
context must provide suffi-cient information to narrow the choice down to a
single type (see Exercise 6.5.2).

**4. Type Inference and Polymorphic
Functions**

Type inference is useful for a language like ML,
which is strongly typed, but does not require names to be declared before they
are used. Type inference ensures that names are used consistently.

The term "polymorphic" refers to any code fragment that can be
executed with arguments of different types. In this section, we consider *parametric poly-morphism, *where the
polymorphism is characterized by parameters or type* *variables. The running example is the ML program in Fig. 6.28,
which defines a function length. The type of length can be described as,
"for any type a, length maps a list of elements of type a to an
integer."

E x a m p l e 6 . 1 4 : In Fig. 6.28, the keyword fun introduces a
function definition;

functions can be recursive. The program fragment defines function *length* with one parameter *x.* The body of the function consists of
a conditional expression. The predefined function *null* tests whether a list is empty, and the predefined function *tl* (short for "tail") returns
the remainder of a list after the first element is removed.

The function length determines
the length or number of elements of a list x.
All elements of a list must have the same type, but length can be
applied to lists whose elements are of any one type. In the following expression, length is
applied to two different types of lists (list elements are enclosed within
"[" and "]"):

*length{["
sun", "mon", "tue"]) + length([10,9,8, 7]) (6.11)*

The list of strings has length 3 and the list of integers has length 4,
so expres-sion (6.11) evaluates to 7. •

Using the symbol " (read as "for any type") and the type
constructor *list,* the type of *length* can be written as

The " symbol is the universal quantifier, and the type variable to which it is applied is said to be bound by it.
Bound variables can be renamed at will, provided all occurrences of the
variable are renamed. Thus, the type expression

is equivalent to (6.12). A type expression with a V symbol in it will be
referred to informally as a "polymorphic type."

Each time a polymorphic function is applied, its
bound type variables can denote a different type. During type checking, at each
use of a polymorphic type we replace the bound variables by fresh variables and
remove the universal quantifiers.

The next example informally infers a type for *length,* implicitly using type inference
rules like (6.9), which is repeated here:

E x a m p l e 6.15 : The abstract syntax tree in Fig. 6.29 represents
the definition of *length* in Fig.
6.28. The root of the tree, labeled fun, represents the function definition.
The remaining nonleaf nodes can be viewed as function applications. The node
labeled -+- represents the application of the operator + to a pair of children.
Similarly, the node labeled if represents the application of an operator if to
a triple formed by its children (for type checking, it does not matter that
either the then or the else part will be evaluated, but not both).

From the body of function length, we can infer its type. Consider the children of the node labeled if,
from left to right. Since null expects
to be applied to lists, x must be a list. Let us use variable a as a
placeholder for the type of the list elements; that is, x has type "list
of a."

**Substitutions, Instances, and
Unification**

If t is a type expression and 5 is a substitution (a mapping from type
vari-ables to type expressions), then we write S(i) for the result of
consistently replacing all
occurrences of each type
variable a in
t by S(a). S(t)
is called an instance of t. For example,
list(integer) is an instance of list(a), since it is the result of
substituting integer for a in list(a).
Note, however, that integer —> float is not an instance of a —> a,
since a substitution must replace all occurrences of a by the same type
expression.

Substitution 5 is a unifier of
type expressions t1 and t2 if S(t1) = S(t2). 5 is the most general unifier of t1 and t2 if
for any other unifier of ri and t2, say
S", it is the case that for any t, S'(t) is an instance of S(t).

In words, S' imposes more constraints on t than S does.

If null(x) is true, then
length(x) is 0. Thus, the type of length
must be "function from list of a to
integer." This inferred type is consistent with the usage of length in the
else part, length(tl{x)) + 1. •

Since variables can appear in type expressions, we have to re-examine
the notion of equivalence of types.
Suppose E1 of type s s' is
applied to E2 of type t. Instead of
simply determining the equality of s and t, we must "unify" them. Informally, we
determine whether s and t can be made structurally equivalent by replacing the
type variables in s and t by type expressions.

A substitution is a mapping from type variables to type expressions. We
write S(t) for the result of applying the substitution S to the variables in
type expression t; see the box on "Substitutions, Instances, and
Unification." Two type expressions t1
and t2 unify if there exists
some substitution 5 such that S(ti) = S(t2).
In practice, we are interested in the most general unifier, which is a
substitution that imposes the fewest constraints on the variables in the
expressions. See Section 6.5.5 for a unification algorithm.

Algorithm 6.16 : Type inference for polymorphic functions.

**INPUT :** A program consisting of a sequence of function definitions followed by
an expression to be evaluated. An expression is made up of function
applications and names, where names can have predefined polymorphic types.

**OUTPUT :** Inferred types for the names in the program.

**METHOD :** For simplicity, we shall deal with unary functions only. The type of a
function f(xi,x2) with two parameters can be represented by a
type expression si x s_{2}
->• t, where si and s2 are the
types of x1 and x2, respectively, and t is the type of the result
f(xi,x2). An expression / ( a , b) can be checked by matching the type of a with
si and the type of b with s_{2}.

Check the function definitions and the expression in the input sequence.
Use the inferred type of a function if it is subsequently used in an
expression.

• For a function definition fun i d i ( i d 2 ) = E, create fresh type variables a and (3. Associate the type a -> (3 with the
function idi, and the type a with the parameter i d 2 . Then, infer a type for
expression E. Suppose a denotes type s and j3 denotes type
t after type inference for E. The
inferred type of function idi is s ->
t. Bind any type variables that remain
unconstrained in s -» t by V quantifiers.

• For a function application
Ei(E2), infer types for E1 and E2. Since
Ei is used as a function, its type must
have the form s -»• s'. (Technically,
the type of E1 must unify with j3 7,
where j3 and 7 are new type variables). Let t be the inferred type of E1. Unify s and t. If unification fails, the expression has a
type error. Otherwise, the inferred type of E1(E2) is s'.

For each occurrence of a polymorphic function, replace the bound
vari-ables in its type by distinct fresh variables and remove the V quantifiers.

The resulting type expression is the inferred type
of this occurrence.

For a name that is encountered for the first time, introduce a fresh
variable for its type.

E x a m p l e 6 . 1 7 : In Fig. 6.30, we infer a
type for function *length.* The root of
the syntax tree in Fig. 6.29 is for a function definition, so we introduce
variables *13 *and 7, associate the
type* j3 *7 with function* length, *and the type* (3 *with* x; *see lines 1-2 of Fig. 6.30.

At the right child of the root, we view if as a polymorphic function
that is applied to a triple, consisting of a boolean and two expressions that
represent the t h e n and else parts. Its type is **VCK.** *boolean* x *a* x *a
—> a.*

Each application of a polymorphic function can be to a different type,
so we make up a fresh variable ai (where i is from "if") and remove
the V; see line 3 of Fig. 6.30. The type of the left child of if must unify
with boolean, and the types of its other two children must unify with on.

The predefined function null has type Vet. list(a) boolean. We use a
fresh type variable an (where n is for "null") in place of the bound
variable a; see line 4. From the application of null to x, we infer that the
type f3 of x must match list(an); see line 5.

At the first child of if, the
type boolean for null(x) matches the type expected by if. At the second child,
the type a* unifies with integer; see line 6.

Now, consider the subexpression
length(tl(x)) + 1. We make up a fresh variable at (where t is for
"tail") for the bound variable a in the type of tl; see line 8. From
the application tl(x), we infer list(at) = (3 = list(an); see line 9.

Since length(tl(x)) is an operand
of +, its type 7 must unify with integer; see line 10. It follows that the type
of length is list(an) ->• integer. After the

function definition is checked, the type variable *a** _{n}* remains
in the type of

for the type of *length.* *•*

**5. An Algorithm for Unification**

Informally, unification is the problem of determining whether two
expressions s and t can be made identical by substituting expressions for the
variables in s and t. Testing equality of expressions is a special case of
unification; if s and t have constants but no variables, then s and t unify if
and only if they are identical. The unification algorithm in this section
extends to graphs with cycles, so it can be used to test structural equivalence
of circular types . ^{7}

We shall implement a graph-theoretic formulation of
unification, where types are represented by graphs. Type variables are
represented by leaves and type constructors are represented by interior nodes.
Nodes are grouped into equiv-alence classes; if two nodes are in the same
equivalence class, then the type expressions they represent must unify. Thus,
all interior nodes in the same class must be for the same type constructor, and
their corresponding children must be equivalent.

Example 6.18 : Consider the two
type expressions

The two expressions are represented by the two nodes labeled - »: 1 in
Fig. 6.31. The integers at the nodes indicate the equivalence classes that the
nodes belong to after the nodes numbered 1 are unified. •

Algorithm 6 . 1 9 : Unification
of a pair of nodes in a type graph.

**INPUT : **A graph representing a type and a pair of nodes** ***m*** **and** ***n***
**to be unified.

**OUTPUT : **Boolean value true if the expressions represented by the nodes** ***m*** **and *n* unify; false, otherwise.

**METHOD : **A node is implemented by a record with fields for a binary operator** **and pointers to the left and right
children. The sets of equivalent nodes are maintained using the *set* field. One node in each equivalence
class is chosen to be the unique representative of the equivalence class by
making its *set* field contain a null
pointer. The *set* fields of the
remaining nodes in the equivalence class will point (possibly indirectly
through other nodes in the set) to the representative. Initially, each node n
is in an equivalence class by itself, with *n*
as its own representative node.

The unification algorithm, shown in Fig. 6.32, uses the following two
operations on nodes:

* **find{n) *returns the representative node of the equivalence class currently* *containing node *n.*

*• union(m, n) *merges the equivalence classes
containing nodes* m *and n. If* *one of the representatives for the
equivalence classes of m and n is a non-variable node, *union* makes that nonvariable node be the representative for the
merged equivalence class; otherwise, *union*
makes one or the other of the original representatives be the new
representative. This asymme-try in the specification of *union* is important because a variable cannot be used as the
representative for an equivalence class for an expression containing a type
constructor or basic type. Otherwise, two inequivalent expressions may be
unified through that variable.

The *union*
operation on sets is implemented by simply changing the *set* field of the representative of one equivalence class so that it
points to the represen-tative of the other. To find the equivalence class that
a node belongs to, we follow the *set*
pointers of nodes until the representative (the node with a null pointer in the
set field) is reached.

Note that the algorithm in Fig. 6.32 uses *s = find(m)* and *t = find(n)*
rather than *m* and *n,* respectively. The representative
nodes *s* and *t* are equal if m and n are in the same equivalence class. If *s* and *t* represent the same basic type, the call *unify(m, n)* returns true. If *s*
and *t* are both interior nodes for a
binary type constructor, we merge their equivalence classes on speculation and
recursively check that their respective children are equivalent. By merging
first, we decrease the number of equivalence classes before recursively
checking the children, so the algorithm terminates.

The substitution of an expression for a variable is implemented by
adding the leaf for the variable to the equivalence class containing the node
for that expression. Suppose either *m*
or *n* is a leaf for a variable.
Suppose also that this leaf has been put into an equivalence class with a node
representing an expression with a type constructor or a basic type. Then *find* will return a representative that
reflects that type constructor or basic type, so that a variable cannot be
unified with two different expressions.

Example 6.20 : Suppose that the two expressions in Example
6.18 are represented by the initial
graph in Fig. 6.33, where each node is in its own equivalence class. When
Algorithm 6.19 is applied to compute
unify(l,9), it notes that nodes 1 and 9 both represent the
same operator. It therefore merges 1 and 9
into the same
equivalence class and
calls unify(2,10) and
unify(8,14). The result of
computing unify(l, 9) is the graph
previously shown in Fig. 6.31.

If Algorithm 6.19 returns true, we can construct a substitution S that
acts as the unifier, as follows. For
each variable a, find(a) gives the node
n that is the representative of the equivalence class of a. The expression
represented by n is S(a). For example, in Fig. 6.31, we see that the
representative for «3 is node 4, which represents a1. The representative for a§
is node 8, which represents list(a.2)- The resulting substitution S is as in
Example 6.18.

**6. Exercises for Section 6.5**

Exercise 6 . 5 . 1 : Assuming that function widen in Fig. 6.26 can
handle any of the types in the hierarchy of Fig. 6.25(a), translate the
expressions below. Assume that c and d are characters, s and t are short
integers, i and j are integers, and x is a float.

Exercise 6.5.2 : As in Ada, suppose that each expression must have a
unique type, but that from a subexpression, by itself, all we can deduce is a
set of possible types. That is, the application of function Ei to argument E2,
represented by E —> Ei (Ei), has the associated rule

Describe an SDD that determines a
unique type for each subexpression by using an attribute *type* to synthesize a set of possible types bottom-up, and, once the
unique type of the overall expression is determined, proceeds top-down to determine
attribute *unique* for the type of each
subexpression.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

**Related Topics **

Copyright © 2018-2021 BrainKart.com; All Rights Reserved. (BS) Developed by Therithal info, Chennai.