Home | | Artificial Intelligence | Learning Decision Trees

# Learning Decision Trees

â€˘ Collect a complete set of examples (training set) from which the decision tree can derive a hypothesis to define (answer) the goal predicate.

DECISION TREES

LEARNING DECISION TREES:

â€˘                     Come up with a set of attributes to describe the object or situation.

â€˘                     Collect a complete set of examples (training set) from which the decision tree can derive a hypothesis to define (answer) the goal predicate.

Decision Tree Example:

Problem: decide whether to wait for a table at a restaurant, based on the following attributes:

1.    Alternate: is there an alternative restaurant nearby?

2.    Bar: is there a comfortable bar area to wait in?

3.    Fri/Sat: is today Friday or Saturday?

4.    Hungry: are we hungry?

5.    Patrons: number of people in the restaurant (None, Some, Full)

6.    Price: price range (\$, \$\$, \$\$\$)

7.    Raining: is it raining outside?

8.     Reservation: have we made a reservation?

9.    Type: kind of restaurant (French, Italian, Thai, Burger)

WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

Logical Representation of a Path

r [Patrons(r, full) Wait_Estimate(r,10 -30 ) Hungry (r,yes)} Will_wait(r)

Expressiveness of Decision Trees

â€˘                     Any Boolean function can be written as a decision tree

â€˘                     E.g., for Boolean functions, truth table row â†’ path to leaf:

â€˘                     Trivially, there is a consistent decision tree for any training set with one path to leaf for

each

example (unless f nondeterministic in x) but it probably won't generalize to new examples

â€˘        Prefer to find more compact decision trees

Limitations

â€“Can only describe one object at a time.

â€“Some functions require an exponentially large decision tree.

â€˘                     E.g. Parity function, majority function

â€˘                     Decision trees are good for some kinds of functions, and bad for others.

â€˘                     There is no one efficient representation for all kinds of functions.

Principle Behind the Decision-Tree-Learning Algorithm

â€˘                     Uses a general principle of inductive learning often called Ockhamâ€™s razor:

â€śThe most likely hypothesis is the simplest one that is consistent with all observations.â€ť

â€˘                     Decision trees can express any function of the input attributes. Decision tree learning Algorithm:

â€˘                     Aim: find a small tree consistent with the training examples

â€˘                     Idea: (recursively) choose "most significant" attribute as root of (sub)tree Choosing an attribute tests:

â€˘                     Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative"

â€˘                     Patrons? is a better choice

Attribute-based representations

â€˘                     Examples described by attribute values (Boolean, discrete, continuous)

â€˘                     E.g., situations where I will/won't wait for a table:

â€˘                     Classification of examples is positive (T) or negative (F) Using information theory

â€˘                     To implement Choose-Attribute in the DTL algorithm

â€˘                     Information Content (Entropy):

I(P(v1), â€¦ , P(vn)) = ÎŁi=1 -P(vi) log2 P(vi)

â€˘                     For a training set containing p positive examples and n negative examples:

A chosen attribute A divides the training set E into subsets E1, â€¦ , Ev according to their values for A, where A has v distinct values. Information Gain (IG) or reduction in entropy from the attribute test: remainder ( A),

Â·                    Choose the attribute with the largest IG

â€˘                     For the training set, p = n = 6, I(6/12, 6/12) = 1 bit

â€˘                     Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the

root

Assessing the performance of the learning algorithm:

â€˘                     A learning algorithm is good if it produces hypotheses that do a good job of predicating the

classifications of unseen examples

Test the algorithmâ€™s prediction performance on a set of new examples, called a test set. Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root

Â·                    Choose the attribute with the largest IG

â€˘                     For the training set, p = n = 6, I(6/12, 6/12) = 1 bit

Â·                    Assessing the performance of the learning algorithm:

â€˘                     A learning algorithm is good if it produces hypotheses that do a good job of predicating the classifications of unseen examples

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Artificial Intelligence : Learning Decision Trees |