A data mining system has the potential to generate thousands or even millions of patterns, or rules. then “are all of the patterns interesting?” Typically not—only a small fraction of the patterns potentially generated would actually be of interest to any given user.

__Interestingness
Of Patterns__

A data
mining system has the potential to generate thousands or even millions of
patterns, or rules. then *“are all of the patterns
interesting?” *Typically not—only a small fraction of the patterns* *potentially generated would actually be
of interest to any given user.

This
raises some serious questions for data mining. You may wonder, *“ What
makes a*

**To answer the first question**, a
pattern is interesting if it is

*easily understood* by
humans,

(2)*valid* on new or test data with some
degree of *certainty*,

potentially
*useful*, and

*novel*.

A pattern
is also interesting if it validates a hypothesis that the user *sought to confirm*. An interesting
pattern represents **knowledge.**

Several
objective measures of pattern interestingness exist. These are based on the
structure of discovered patterns and the statistics underlying them. An
objective measure for association rules of the form *X Y* is rule support, representing the percentage of transactions
from a transaction database that the given rule satisfies.

This is
taken to be the probability *P*(XU*Y*),where *XUY* indicates that a transaction contains both *X* and *Y*, that is, the
union of itemsets *X* and *Y*. Another objective measure for
association rules is confidence, which assesses the degree of certainty of the
detected association. This is taken to be the conditional probability *P*(Y *|
X*), that is, the probability that a transaction containing *X* also contains *Y*. More formally, support and confidence are defined as

*support***( X Y) = P(XUY) confidence(X Y) = P(Y | X)**

In
general, each interestingness measure is associated with a threshold, which may
be controlled by the user. For example, rules that do not satisfy a confidence
threshold of, say, 50% can be considered uninteresting. Rules below the
threshold threshold likely reflect noise, exceptions, or minority cases and are
probably of less value.

The
second question—―*Can a data mining system generate all of the interesting**patterns**?*‖—refers to the completeness of a data
mining algorithm. It is often unrealistic and** **inefficient for data
mining systems to generate all of the possible patterns. Instead, user-provided
constraints and interestingness measures should be used to focus the search.

Finally,

The third
question—*“ Can a data mining system generate only interesting atterns?”*—is
an optimization problem in data mining. It is highly desirable for data mining
systems to generate only interesting patterns. This would be much more
efficient for users and data mining systems, because neither would have to
search through the patterns generated in order to identify the truly
interesting ones. Progress has been made in this direction; however, such
optimization remains a challenging issue in data mining.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Data Warehousing and Data Mining : Interestingness of Patterns |

**Related Topics **

Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.