Home | | Data Ware Housing and Data Mining | Other Classification Methods

Chapter: Data Warehousing and Data Mining : Association Rule Mining and Classification

Other Classification Methods

Genetic Algorithm: based on an analogy to biological evolution

Other Classification Methods

 

Genetic Algorithms

o     Genetic Algorithm: based on an analogy to biological evolution

 

o     An initial population is created consisting of randomly generated rules

 

·         Each rule is represented by a string of bits

 

·        E.g., if A1 and ¬A2 then C2 can be encoded as 100 o If an attribute has k > 2 values, k bits can be used

 

o     Based on the notion of survival of the fittest, a new population is formed to consist of the fittest rules and their offsprings

 

o     The fitness of a rule is represented by its classification accuracy on a set of training examples

 

o     Offsprings are generated by crossover and mutation

 

o     The process continues until a population P evolves when each rule in P satisfies a prespecified threshold

 

o     Slow but easily parallelizable

 

Rough Set Approach:

 

o     Rough sets are used to approximately or ―roughly‖ define equivalent classes

 

o     A rough set for a given class C is approximated by two sets: a lower approximation (certain to be in C) and an upper approximation (cannot be described as not belonging to C)

 

o   Finding the minimal subsets (reducts) of attributes for feature reduction is NP-hard but a discernibility matrix (which stores the differences between attribute values for each pair of data tuples) is used to reduce the computation intensity


Figure: A rough set approximation of the set of tuples of the class C suing lower and upper approximation sets of C. The rectangular regions represent equivalence classes

 

Fuzzy Set approaches

 

 

o     Fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of membership (such as using fuzzy membership graph)

o     Attribute values are converted to fuzzy values

 

e.g., income is mapped into the discrete categories {low, medium, high} with fuzzy values calculated

o     For a given new sample, more than one fuzzy value may apply

 

o     Each applicable rule contributes a vote for membership in the categories

 

o   Typically, the truth values for each predicted category are summed, and these sums are combined



Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Data Warehousing and Data Mining : Association Rule Mining and Classification : Other Classification Methods |


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.