What is Association rule?
Association rule finds interesting association or correlation relationships among a large set of data items which is used for decision-making processes. Association rules analyzes buying patterns that are frequently associated or purchased together.
What are the Applications of Association rule mining?
Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.
Define support and confidence in Association rule mining.
Support S is the percentage of transactions in D that contain AUB. Confidence c is the percentage of transactions in D containing A that also contain B. Support ( A=>B)= P(AUB)
How are association rules mined from large databases?
Association rule mining is a two-step process.
Find all frequent itemsets.
Generate strong association rules from the frequent itemsets.
Define Data Classification.
It is a two-step process. In the first step, a model is built describing a pre-determined set of data classes or concepts. The model is constructed by analyzing database tuples described by attributes. In the second step the model is used for classification.
Describe the two common approaches to tree pruning.
In the prepruning approach, a tree is ―pruned‖ by halting its construction early. The second approach, postpruning, removes branches from a ―fully grown‖ tree. A tree node is pruned by removing its branches.
What is a ―decision tree‖?
It is a flow-chart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class distributions. Decision tree is a predictive model. Each branch of the tree is a classification question and leaves of the tree are partition of the dataset with their classification.
What do you meant by concept hierarchies?
A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher-level, more general concepts. Concept hierarchies allow specialization, or drilling down ,where by concept values are replaced by lower-level concepts
Where are decision trees mainly used?
Used for exploration of dataset and business problems Data preprocessing for other predictive analysis Statisticians use decision trees for exploratory analysis
What is decision tree pruning?
Once tree is constructed , some modification to the tree might be needed to improve the performance of the tree during classification phase. The pruning phase might remove redundant comparisons or remove subtrees to achieve better performance.
ID3 is algorithm used to build decision tree. The following steps are followed to built a decision tree.
Chooses splitting attribute with highest information gain.
Split should reduce the amount of information needed by large amount.
What is Classification?
predicts categorical class labels
classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data
What is Prediction?
models continuous-valued functions, i.e., predicts unknown or missing values
What is supervised learning (classification)
Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations
New data is classified based on the training set
What is Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
16.Define Decision tree
A flow-chart-like tree structure
Internal node denotes a test on an attribute o Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution
17.What is the Use of decision tree? Classifying an unknown sample
Test the attribute values of the sample against the decision tree
18.What are the other classification methods?
k-nearest neighbor classifier
Rough set approach
Fuzzy set approaches
19. What is linear regression?
In linear regression data are modeled using a straight line. Linear regression is the simplest form of regression. Bivariate linear regression models a random variable Y called response variable as a linear function of another random variable X, called a predictor variable. Y = a + b X
20. What is the classification of association rules based on various criteria?
Based on the types of values handled in the rule.
Boolean Association rule.
Quantitative Association rule.
Based on the dimensions of data involved in the rule.
Single Dimensional Association rule.
Multi Dimensional Association rule.
Based on the levels of abstractions involved in the rule.
Single level Association rule.
Multi level Association rule.
Based on various extensions to association mining.
Frequent closed item sets.
21.What is the purpose of Apriori algorithm?
The name of the algorithm is based on the fact that the algorithm uses prior knowledge for find frequent
22.What are the two steps of Apriori algorithm?
23.Give the few techniques to improve the efficiency of apriori algorithm
Dynamic item set counting
24.Define FP growth.(with out candidate generation)
An interesting method in this attempt is called frequent-pattern growth, or simply FP-growth, which adopts a divide-and-conquer strategy as follows. First, it compresses the database representing frequent items into a frequent-pattern tree, or FP-tree, which retains the item set association information. It then divides the compressed database into a set of conditional databases
25. What are Bayesian classifiers?
Bayesian classifiers are statistical classifiers they can predict the class membership probabilities that give tuples belongs to particular class
26 What is rule based classifier
It uses set of IF-THEN rules for classification rules can be extract from a decision tree rule may be generated from training data using sequential covering algorithm and associative classification algorithm
27. What is rule?
Rules are a good way of representing information or bits of knowledge. A rule-based classifier uses a set of IF-THEN rules for classification. An IF-THEN rule is an expression of the form
IF condition THEN conclusion. An example is rule R1,
R1: IF age = youth AND student = yes THEN buys computer = yes.
28.What is Backpropagation?
It is a neural network algorithm for classification that employs a method of gradient descent. it searches for set of weight that can model the data so as to minimize the mean squared distance between the network class prediction.
29. Define support vector machine.
a promising new method for the classification of both linear and nonlinear data.. It uses a nonlinear mapping to transform the original training data into a higher dimension. it searches for the linear optimal separating hyperplane for separation of the data using essential training tuples called support vectors
30. What is associative classification?
It uses a association mining technique that search for frequently occurring patterns in large database. the pattern may generate rules, which can be analyzed for classification.
31. Define Prediction with classification.
Prediction is similar to classification
First, construct a model
Second, use model to predict unknown value
ii). Major method for prediction is regression
Linear and multiple regression
Prediction is different from classification
Classification refers to predict categorical class label
Prediction models continuous-valued functions