Home | | Data Ware Housing and Data Mining | Frequent Itemsets, Closed Itemsets, and Association Rules

Chapter: Data Warehousing and Data Mining : Association Rule Mining and Classification

Frequent Itemsets, Closed Itemsets, and Association Rules

Association rule mining: Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

Frequent Itemsets, Closed Itemsets, and Association Rules

 

·         A set of items is referred to as an itemset.

 

·         An itemset that contains k items is a k-itemset.

 

·         The set {computer, antivirus software} is a 2-itemset.

 

·         The occurrence frequency of an itemset is the number of transactions that contain the itemset.

 

·        This is also known, simply, as the frequency, support count, or count of the itemset.


Rules that satisfy both a minimum support threshold (min sup) and a minimum confidence threshold (min conf) are called Strong Association Rules.

 

In general, association rule mining can be viewed as a two-step process:

1.  Find all frequent itemsets: By definition, each of these itemsets will occur at least as frequently as a predetermined minimum support count, min_sup.

2.  Generate strong association rules from the frequent itemsets: By definition, these rules must satisfy minimum support and minimum confidence.

 

Association Mining

 

Association rule mining: Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

 

Applications: Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.

Examples.

 

Rule form: ―Body ® Head [support, confidence]‖. buys(x, ―diapers‖) ® buys(x, ―beers‖) [0.5%, 60%]

major(x, ―CS‖) ^ takes(x, ―DB‖) ®  grade(x, ―A‖) [1%, 75%]

 

Association Rule: Basic Concepts

 

Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit)

 

Find: all rules that correlate the presence of one set of items with that of another set of items E.g., 98% of people who purchase tires and auto accessories also get automotive services done

Applications

 

Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) Home Electronics Þ * (What other products should the store stocks up?) Attached mailing in direct marketing Detecting ―ping-pong‖ing of patients, faulty ―collisions‖

 

Rule Measures: Support and Confidence

 

Find all the rules X & Y Þ Z with minimum confidence and support Support, s, probability that a transaction contains {X 4 Y 4 Z}

 

Confidence, c, conditional probability that a transaction having {X 4 Y} also contains Z Let minimum support 50%, and minimum confidence 50%, we have

 

A Þ C  (50%, 66.6%)

C Þ A (50%, 100%)


Association Rule Mining: A Road Map

Boolean vs. quantitative associations (Based on the types of values handled)

buys(x, ―SQLServer‖) ^ buys(x, ―DMBook‖) ® buys(x, ―DBMiner‖) [0.2%, 60%] o age(x, ―30..39‖) ^ income(x, ―42..48K‖) ® buys(x, ―PC‖) [1%, 75%]

 

Single dimension vs. multiple dimensional associations (see ex. Above) 

Ø Single level vs. multiple-level analysis

What brands of beers are associated with what brands of diapers? 

Ø Various extensions

Correlation, causality analysis

 

Association does not necessarily imply correlation or causality o Maxpatterns and closed itemsets

Constraints enforced

E.g., small sales (sum < 100) trigger big buys (sum > 1,000)?

 

Market – Basket analysis

 

A market basket is a collection of items purchased by a customer in a single transaction, which is a well-defined business activity. For example, a customer's visits to a grocery store or an online purchase from a virtual store on the Web are typical customer transactions. Retailers accumulate huge collections of transactions by recording business activities over time. One common analysis run against a transactions database is to find sets of items, or itemsets, that appear together in many transactions. A business can use knowledge of these patterns to improve the Placement of these items in the store or the layout of mail- order catalog page and Web pages. An itemset containing i items is called an i-itemset. The percentage of transactions that contain an itemset is called the itemset's support. For an itemset to be interesting, its support must be higher than a user-specified minimum. Such itemsets are said to be frequent.

 

Figure : Market basket analysis.


 

Rule support and confidence are two measures of rule interestingness. They respectively reflect the usefulness and certainty of discovered rules. A support of 2% for association Rule means that 2% of all the transactions under analysis show that computer and financial management software are purchased together. A confidence of 60% means that 60% of the customers who purchased a computer also bought the software. Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.


Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Data Warehousing and Data Mining : Association Rule Mining and Classification : Frequent Itemsets, Closed Itemsets, and Association Rules |


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.