DATA MINING
Define data mining
Data
mining is a process of extracting or mining knowledge from huge amount of data.
Define pattern evaluation
Pattern
evaluation is used to identify the truly interesting patterns representing
knowledge based on some interesting measures.
Define knowledge representation
Knowledge
representation techniques are used to present the mined knowledge to the user.
List the
five primitives for specification of a data mining task.
task-relevant data
kind of knowledge to be mined
background knowledge
interestingness measures
knowledge
presentation and visualization techniques to be used for displaying the
discovered patterns
What is Visualization?
Visualization
is for depiction of data and to gain intuition about data being observed. It
assists the analysts in selecting display formats, viewer perspectives and data
representation schema
Mention
some of the application areas of data mining
DNA
analysis
Market
analysis
Financial
data analysis
Banking
industry
Retail
Industry
Health
care analysis.
Telecommunication
industry
Define data cleaning
Data
cleaning means removing the inconsistent data or noise and collecting necessary
information
Define Data integration.
Integration
of multiple databases, data cubes, or files
Why we need Data transformation
Smoothing:
remove noise from data min-max normalization
Aggregation:
summarization, data z-score normalization cube construction normalization by
decimal scaling
Generalization:
concept hierarchy Attribute/feature construction climbing
Normalization:
scaled to fall within a New
attributes constructed from the given
ones small, specified range
Define Data reduction.
Data
reduction Obtains reduced representation in volume but produces the same or
similar analytical results.
What is meant by Data
discretization
It can be
defined as Part of data reduction but with particular importance, especially
for numerical data
What is the discretization
processes involved in data preprocessing?
It
reduces the number of values for a given continuous attribute by dividing the
range of the attribute into intervals. Interval labels can then be used to
replace actual data values.
Define Concept hierarchy.
It reduce
the data by collecting and replacing low level concepts (such as numeric values
for the attribute age) by higher level concepts (such as young, middle-aged, or
senior).
Why we need data preprocessing. Data in
the real world is dirty
incomplete: lacking attribute values, lacking
certain attributes of interest, or containing only aggregate data
noisy: containing errors or outliers
inconsistent: containing discrepancies in codes or
names
Give some data mining tools.
DBMiner
GeoMiner
Multimedia miner
WeblogMiner
Describe the use of DBMiner.
Used to
perform data mining functions, including characterization, association,
classification, prediction and clustering.
Applications of DBMiner.
The DBMiner
system can be used as a general-purpose online analytical mining system for
both OLAP and data mining in relational database and datawarehouses.Used in
medium to large relational databases with fast response time.
What are
the types of knowledge to be mined?
Characterization
Discrimination
Association
Classification
prediction
Clustering
Outlier analysis
Other data mining tasks
Define Relational databases.
A
relational database is a collection of tables, each of which is assigned a
unique name. Each table consists of a set of attributes (columns or fields) and
usually stores a large set of tuples(records or rows).Each tuple in a
relational table represents an object identified by a unique key and described
by a set of attribute values.
Define Transactional Databases.
A
transactional database consists of a file where each record represents a
transaction. A transaction typically includes a unique transaction identity
number (trans_ID), and a list of the items making up the transaction.
21. Define
Spatial Databases.
Spatial
databases contain spatial-related information. Such databases include
geographic (map) databases, VLSI chip design databases, and medical and
satellite image databases. Spatial data may be represented in raster format,
consisting of n-dimensional bit maps or pixel maps.
22. What is Temporal Database?
Temporal
database store time related data .It usually stores relational data that
include time related attributes. These attributes may involve several time
stamps, each having different semantics.
23. What are Time-Series
databases?
A
Time-Series database stores sequences of values that change with time,such as
data collected regarding the stock exchange.
24. What is Legacy database?
A Legacy
database is a group of heterogeneous databases that combines different kinds of
data systems, such as relational or object-oriented databases, hierarchical
databases, network databases, spread sheets, multimedia databases or file
systems.
25 What are the steps in the data
mining process?
Data cleaning
Data integration
Data selection
Data transformation
Data mining
Pattern evaluation
Knowledge representation
What is Characterization?
It is a
summarization of the general characteristics or features of a target class of
data.
27. What is Discrimination?
It is a
comparison of the general features of target class data objects with the
general features of objects from one or a set of contrasting classes.
28. What is Classification?
Classification
is the process of finding a model (or function) that describes and
distinguishes data classes or concepts, for the purpose of being able to use
the model to predict the class of objects whose class label is unknown. The
derived model is based on the analysis of a set of training data
What are
the classification of data mining system
Classification according to the kinds of databases mined
Classification according to the kinds of knowledge mined
Classification according to the kinds of techniques utilized
Classification according to the applications adapted:
30 what are the scheme of
integrating data mining system with a data warehouse
No coupling
Loose coupling:
Semi tight coupling
Tight coupling
What are
the issues of data mining
Mining methodology and user interaction issues
Mining different
kinds of knowledge in databases:
Interactive
mining of knowledge at multiple levels of abstraction
Incorporation
of background knowledge
Data
mining query languages and ad hoc data mining
Presentation
and visualization of data mining results
Handling
noisy or incomplete data
Pattern
evaluation
Performance issues:
Efficiency
and scalability of data mining algorithms
Parallel,
distributed, and incremental mining algorithms
Issues
relating to the diversity of database types: Handling of relational and complex
types of data
Mining
information from heterogeneous databases and global information systems
What is data pre processing.
The real
world data’s are normally noise data so before organizing the data warehouse we
need to Preprocess the data
What is
preprocessing technique?
Data cleaning
Data integration
Data transformation
Data reduction
Define
data cleaning
Data
cleaning means removing the inconsistent data or noise and collecting necessary
information
Define Data integration.
Integration
of multiple databases, data cubes, or files
Why we need Data transformation
Smoothing: remove noise from data
Aggregation: summarization, data cube construction
Generalization: concept hierarchy climbing
Normalization: scaled to fall within a small, specified range
Min-max
normalization
Z-score
normalization
Normalization
by decimal scaling
Attribute/feature
construction: New attributes constructed from the given ones
Define Data reduction.
Data
reduction Obtains reduced representation in volume but produces the same or
similar analytical results.
What is meant by Data
discretization
It can be
defined as Part of data reduction but with particular importance, especially
for numerical data
What is the discretization
processes involved in data preprocessing?
It
reduces the number of values for a given continuous attribute by dividing the
range of the attribute into intervals. Interval labels can then be used to
replace actual data values.
Define Concept hierarchy.
It reduce
the data by collecting and replacing low level concepts (such as numeric values
for the attribute age) by higher level concepts (such as young, middle-aged, or
senior).
Why we need data preprocessing.
Data in the real world is dirty
incomplete: lacking attribute values, lacking
certain attributes of interest, or containing only aggregate data
noisy: containing errors or outliers
inconsistent: containing discrepancies in codes or
names
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.