The method that mines the complete set of frequent itemsets with candidate generation. Apriori property & The Apriori Algorithm.
All nonempty subsets of a frequent item set most also be frequent.
o An item set I does not satisfy the minimum support threshold, min-sup, then I is not frequent, i.e., support(I) < min-sup
o If an item A is added to the item set I then the resulting item set (I U A) can not occur more frequently than I.
Monotonic functions are functions that move in only one direction.
This property is called anti-monotonic.
If a set can not pass a test, all its supersets will fail the same test as well.
This property is monotonic in failing the test.
The Apriori Algorithm
Join Step: Ck is generated by joining Lk-1with itself
Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset
The method that mines the complete set of frequent itemsets without generation.
Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure
highly condensed, but complete for frequent pattern mining
avoid costly database scans
Develop an efficient, FP-tree-based frequent pattern mining method
A divide-and-conquer methodology: decompose mining tasks into smaller ones
Avoid candidate generation: sub-database test only!
Example 5.5 FP-growth (finding frequent itemsets without candidate generation). We re-examine the mining of transaction database, D, of Table 5.1 in Example 5.3 using the frequent pattern growth approach.
Benefits of the FP-tree Structure
o never breaks a long pattern of any transaction
o preserves complete information for frequent pattern mining
o reduce irrelevant information—infrequent items are gone
o frequency descending ordering: more frequent items are more likely to be shared
o never be larger than the original database (if not count node-links and counts)
o Example: For Connect-4 DB, compression ratio could be over 100
Mining Frequent Patterns Using FP-tree
General idea (divide-and-conquer)
o Recursively grow frequent pattern path using the FP-tree
o For each item, construct its conditional pattern-base, and then its conditional FP-tree
o Repeat the process on each newly created conditional FP-tree
o Until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern)
Major Steps to Mine FP-tree
1. Construct conditional pattern base for each node in the FP-tree
2. Construct conditional FP-tree from each conditional pattern-base
3. Recursively mine conditional FP-trees and grow frequent patterns obtained so far
o If the conditional FP-tree contains a single path, simply enumerate all the patterns
Principles of Frequent Pattern Growth
Pattern growth property
Let a be a frequent itemset in DB, B be a's conditional pattern base, and b be an itemset in B. Then a È b is a frequent itemset in DB iff b is frequent in B.
―abcdef ‖ is a frequent pattern, if and only if
―abcde ‖ is a frequent pattern, and
―f ‖ is frequent in the set of transactions containing ―abcde ‖
Why Is Frequent Pattern Growth Fast?
Our performance study shows
FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection
No candidate generation, no candidate test
Use compact data structure
Eliminate repeated database scan
Basic operation is counting and FP-tree building