Interestingness
Of Patterns
A data mining system has the potential to generate thousands or even millions of patterns, or rules. then “are all of the patterns interesting?” Typically not—only a small fraction of the patterns potentially generated would actually be of interest to any given user.
This
raises some serious questions for data mining. You may wonder, “What
makes a pattern interesting? Can a data mining system generate all of the interesting
patterns? Can a data mining system generate only interesting patterns?”
To answer the first question, a
pattern is interesting if it is
easily understood by
humans,
(2)valid on new or test data with some
degree of certainty,
potentially
useful, and
novel.
A pattern
is also interesting if it validates a hypothesis that the user sought to confirm. An interesting
pattern represents knowledge.
Several
objective measures of pattern interestingness exist. These are based on the
structure of discovered patterns and the statistics underlying them. An
objective measure for association rules of the form X Y is rule support, representing the percentage of transactions
from a transaction database that the given rule satisfies.
This is
taken to be the probability P(XUY),where XUY indicates that a transaction contains both X and Y, that is, the
union of itemsets X and Y. Another objective measure for
association rules is confidence, which assesses the degree of certainty of the
detected association. This is taken to be the conditional probability P(Y |
X), that is, the probability that a transaction containing X also contains Y. More formally, support and confidence are defined as
support(X Y) = P(XUY) confidence(X Y) = P(Y | X)
In
general, each interestingness measure is associated with a threshold, which may
be controlled by the user. For example, rules that do not satisfy a confidence
threshold of, say, 50% can be considered uninteresting. Rules below the
threshold threshold likely reflect noise, exceptions, or minority cases and are
probably of less value.
The
second question—―Can a data mining system generate all of the interesting patterns?‖—refers to the completeness of a data
mining algorithm. It is often unrealistic and inefficient for data
mining systems to generate all of the possible patterns. Instead, user-provided
constraints and interestingness measures should be used to focus the search.
Finally,
The third
question—“Can a data mining system generate only interesting atterns?”—is
an optimization problem in data mining. It is highly desirable for data mining
systems to generate only interesting patterns. This would be much more
efficient for users and data mining systems, because neither would have to
search through the patterns generated in order to identify the truly
interesting ones. Progress has been made in this direction; however, such
optimization remains a challenging issue in data mining.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.