Chapter: Data Warehousing and Data Mining : Association Rule Mining and Classification

Important Short Questions and Answers : Association Rule Mining and Classification

Data Warehousing and Data Mining - Association Rule Mining and Classification - Important Short Questions and Answers : Association Rule Mining and Classification

What is Association rule?

Association rule finds interesting association or correlation relationships among a large set of data items which is used for decision-making processes. Association rules analyzes buying patterns that are frequently associated or purchased together.

What are the Applications of Association rule mining?

Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.

Define support and confidence in Association rule mining.

Support S is the percentage of transactions in D that contain AUB. Confidence c is the percentage of transactions in D containing A that also contain B. Support ( A=>B)= P(AUB)

Confidence (A=>B)=P(B/A)

How are association rules mined from large databases?

Association rule mining is a two-step process.

Find all frequent itemsets.

Generate strong association rules from the frequent itemsets.

Define Data Classification.

It is a two-step process. In the first step, a model is built describing a pre-determined set of data classes or concepts. The model is constructed by analyzing database tuples described by attributes. In the second step the model is used for classification.

Describe the two common approaches to tree pruning.

In the prepruning approach, a tree is ―pruned‖ by halting its construction early. The second approach, postpruning, removes branches from a ―fully grown‖ tree. A tree node is pruned by removing its branches.

What is a ―decision tree‖?

It is a flow-chart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class distributions. Decision tree is a predictive model. Each branch of the tree is a classification question and leaves of the tree are partition of the dataset with their classification.

What do you meant by concept hierarchies?

A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher-level, more general concepts. Concept hierarchies allow specialization, or drilling down ,where by concept values are replaced by lower-level concepts

Where are decision trees mainly used?

Used for exploration of dataset and business problems Data preprocessing for other predictive analysis Statisticians use decision trees for exploratory analysis

What is decision tree pruning?

Once tree is constructed , some modification to the tree might be needed to improve the performance of the tree during classification phase. The pruning phase might remove redundant comparisons or remove subtrees to achieve better performance.

Explain ID3

ID3 is algorithm used to build decision tree. The following steps are followed to built a decision tree.

Chooses splitting attribute with highest information gain.

Split should reduce the amount of information needed by large amount.

What is Classification?

predicts categorical class labels

classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data

What is Prediction?

models continuous-valued functions, i.e., predicts unknown or missing values

What is supervised learning (classification)

Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations

New data is classified based on the training set

What is Unsupervised learning (clustering)

The class labels of training data is unknown

Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

16.Define Decision tree

A flow-chart-like tree structure

Internal node denotes a test on an attribute o Branch represents an outcome of the test

Leaf nodes represent class labels or class distribution

17.What is the Use of decision tree? Classifying an unknown sample

Test the attribute values of the sample against the decision tree

18.What are the other classification methods?

k-nearest neighbor classifier

Rough set approach

case-based reasoning

Fuzzy set approaches

Genetic algorithm

19. What is linear regression?

In linear regression data are modeled using a straight line. Linear regression is the simplest form of regression. Bivariate linear regression models a random variable Y called response variable as a linear function of another random variable X, called a predictor variable. Y = a + b X

20. What is the classification of association rules based on various criteria?

Based on the types of values handled in the rule.

Boolean Association rule.

Quantitative Association rule.

Based on the dimensions of data involved in the rule.

Single Dimensional Association rule.

Multi Dimensional Association rule.

Based on the levels of abstractions involved in the rule.

Single level Association rule.

Multi level Association rule.

Based on various extensions to association mining.

Maxpatterns.

Frequent closed item sets.

21.What is the purpose of Apriori algorithm?

The name of the algorithm is based on the fact that the algorithm uses prior knowledge for find frequent

Item set

22.What are the two steps of Apriori algorithm?

Join step

Prune step

23.Give the few techniques to improve the efficiency of apriori algorithm

Hash-based technique

Transaction reduction

Partitioning

Sampling

Dynamic item set counting

24.Define FP growth.(with out candidate generation)

An interesting method in this attempt is called frequent-pattern growth, or simply FP-growth, which adopts a divide-and-conquer strategy as follows. First, it compresses the database representing frequent items into a frequent-pattern tree, or FP-tree, which retains the item set association information. It then divides the compressed database into a set of conditional databases

25. What are Bayesian classifiers?

Bayesian classifiers are statistical classifiers they can predict the class membership probabilities that give tuples belongs to particular class

26 What is rule based classifier

It uses set of IF-THEN rules for classification rules can be extract from a decision tree rule may be generated from training data using sequential covering algorithm and associative classification algorithm

27. What is rule?

Rules are a good way of representing information or bits of knowledge. A rule-based classifier uses a set of IF-THEN rules for classification. An IF-THEN rule is an expression of the form

IF condition THEN conclusion. An example is rule R1,

R1: IF age = youth AND student = yes THEN buys computer = yes.

28.What is Backpropagation?

It is a neural network algorithm for classification that employs a method of gradient descent. it searches for set of weight that can model the data so as to minimize the mean squared distance between the network class prediction.

29. Define support vector machine.

a promising new method for the classification of both linear and nonlinear data.. It uses a nonlinear mapping to transform the original training data into a higher dimension. it searches for the linear optimal separating hyperplane for separation of the data using essential training tuples called support vectors

30. What is associative classification?

It uses a association mining technique that search for frequently occurring patterns in large database. the pattern may generate rules, which can be analyzed for classification.

31. Define Prediction with classification.

Prediction is similar to classification

First, construct a model

Second, use model to predict unknown value

ii). Major method for prediction is regression

Linear and multiple regression

Non-linear regression

Prediction is different from classification

Classification refers to predict categorical class label

Prediction models continuous-valued functions

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Data Warehousing and Data Mining : Association Rule Mining and Classification : Important Short Questions and Answers : Association Rule Mining and Classification |