Frequent
Itemsets, Closed Itemsets, and Association Rules
·
A set of items is referred to as an itemset.
·
An itemset that contains k items is a k-itemset.
·
The set {computer,
antivirus software} is a 2-itemset.
·
The occurrence frequency of an itemset is the
number of transactions that contain the itemset.
·
This is also known, simply, as the frequency, support count, or count of the itemset.
Rules
that satisfy both a minimum support threshold (min sup) and a minimum confidence threshold (min conf) are called Strong
Association Rules.
In
general, association rule mining can be viewed as a two-step process:
1. Find all
frequent itemsets: By definition, each of these itemsets will occur at least as
frequently as a predetermined minimum support count, min_sup.
2. Generate
strong association rules from the frequent itemsets: By definition, these rules
must satisfy minimum support and minimum confidence.
Association Mining
Association
rule mining: Finding frequent patterns, associations,
correlations, or causal structures
among sets of items or objects in transaction databases, relational databases,
and other information repositories.
Applications:
Basket
data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.
Examples.
Rule
form: ―Body ® Head [support, confidence]‖. buys(x, ―diapers‖) ® buys(x,
―beers‖) [0.5%, 60%]
major(x,
―CS‖) ^ takes(x, ―DB‖) ® grade(x, ―A‖)
[1%, 75%]
Association Rule: Basic Concepts
Given: (1) database of transactions, (2) each
transaction is a list of items (purchased by a customer in a visit)
Find: all rules that correlate the presence
of one set of items with that of another set of items E.g., 98% of people who purchase tires and auto
accessories also get automotive services done
Applications
Maintenance Agreement (What the
store should do to boost Maintenance Agreement sales) Home Electronics Þ * (What
other products should the store stocks up?) Attached mailing in direct marketing Detecting
―ping-pong‖ing of patients, faulty ―collisions‖
Rule
Measures: Support and Confidence
Find all
the rules X & Y Þ Z with minimum confidence and support Support,
s, probability that a transaction
contains {X 4 Y 4 Z}
Confidence,
c, conditional probability that a
transaction having {X 4 Y} also
contains Z Let minimum support 50%,
and minimum confidence 50%, we have
A Þ C (50%, 66.6%)
C Þ A (50%, 100%)
Association Rule Mining: A Road
Map
Boolean vs. quantitative associations (Based on the
types of values handled)
buys(x,
―SQLServer‖) ^ buys(x, ―DMBook‖) ® buys(x, ―DBMiner‖) [0.2%, 60%] o age(x, ―30..39‖) ^ income(x,
―42..48K‖) ® buys(x, ―PC‖) [1%, 75%]
Single dimension vs. multiple dimensional associations (see ex. Above)
Ø Single level vs. multiple-level analysis
What brands of beers are associated with what brands of diapers?
Ø Various
extensions
Correlation, causality analysis
Association
does not necessarily imply correlation or causality o Maxpatterns and closed itemsets
Constraints enforced
E.g., small sales (sum < 100) trigger big buys
(sum > 1,000)?
Market – Basket analysis
A market
basket is a collection of items purchased by a customer in a single
transaction, which is a well-defined business activity. For example, a
customer's visits to a grocery store or an online purchase from a virtual store
on the Web are typical customer transactions. Retailers accumulate huge collections
of transactions by recording business activities over time. One common analysis
run against a transactions database is to find sets of items, or itemsets, that appear together in many
transactions. A business can use knowledge of these patterns to improve the
Placement of these items in the store or the layout of mail- order catalog page
and Web pages. An itemset containing i
items is called an i-itemset. The
percentage of transactions that contain an itemset is called the itemset's support. For an itemset to be
interesting, its support must be higher than a user-specified minimum. Such
itemsets are said to be frequent.
Figure :
Market basket analysis.
Rule
support and confidence are two measures of rule interestingness. They respectively
reflect the usefulness and certainty of discovered rules. A support of 2% for
association Rule means that 2% of all the transactions under analysis show that
computer and financial management software are purchased together. A confidence
of 60% means that 60% of the customers who purchased a computer also bought the
software. Typically, association rules are considered interesting if they
satisfy both a minimum support threshold and a minimum confidence threshold.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.