Chapter: Data Warehousing and Data Mining

Important Short Questions and Answers : Data Mining

Data Warehousing and Data Mining - Important Short Questions and Answers : Data Mining

DATA MINING

Define data mining

Data mining is a process of extracting or mining knowledge from huge amount of data.

Define pattern evaluation

Pattern evaluation is used to identify the truly interesting patterns representing knowledge based on some interesting measures.

Define knowledge representation

Knowledge representation techniques are used to present the mined knowledge to the user.

List the five primitives for specification of a data mining task.

task-relevant data

kind of knowledge to be mined

background knowledge

interestingness measures

knowledge presentation and visualization techniques to be used for displaying the discovered patterns

What is Visualization?

Visualization is for depiction of data and to gain intuition about data being observed. It assists the analysts in selecting display formats, viewer perspectives and data representation schema

Mention some of the application areas of data mining

DNA analysis

Market analysis

Financial data analysis

Banking industry

Retail Industry

Health care analysis.

Telecommunication industry

Define data cleaning

Data cleaning means removing the inconsistent data or noise and collecting necessary information

Define Data integration.

Integration of multiple databases, data cubes, or files

Why we need Data transformation

Smoothing: remove noise from data min-max normalization

Aggregation: summarization, data z-score normalization cube construction normalization by decimal scaling

Generalization: concept hierarchy Attribute/feature construction climbing

Normalization: scaled to fall within a New attributes constructed from the given ones small, specified range

Define Data reduction.

Data reduction Obtains reduced representation in volume but produces the same or similar analytical results.

What is meant by Data discretization

It can be defined as Part of data reduction but with particular importance, especially for numerical data

What is the discretization processes involved in data preprocessing?

It reduces the number of values for a given continuous attribute by dividing the range of the attribute into intervals. Interval labels can then be used to replace actual data values.

Define Concept hierarchy.

It reduce the data by collecting and replacing low level concepts (such as numeric values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).

Why we need data preprocessing. Data in the real world is dirty

incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data

noisy: containing errors or outliers

inconsistent: containing discrepancies in codes or names

Give some data mining tools.

DBMiner

GeoMiner

Multimedia miner

WeblogMiner

Describe the use of DBMiner.

Used to perform data mining functions, including characterization, association, classification, prediction and clustering.

Applications of DBMiner.

The DBMiner system can be used as a general-purpose online analytical mining system for both OLAP and data mining in relational database and datawarehouses.Used in medium to large relational databases with fast response time.

What are the types of knowledge to be mined?

Characterization

Discrimination

Association

Classification

prediction

Clustering

Outlier analysis

Other data mining tasks

Define Relational databases.

A relational database is a collection of tables, each of which is assigned a unique name. Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples(records or rows).Each tuple in a relational table represents an object identified by a unique key and described by a set of attribute values.

Define Transactional Databases.

A transactional database consists of a file where each record represents a transaction. A transaction typically includes a unique transaction identity number (trans_ID), and a list of the items making up the transaction.

21. Define Spatial Databases.

Spatial databases contain spatial-related information. Such databases include geographic (map) databases, VLSI chip design databases, and medical and satellite image databases. Spatial data may be represented in raster format, consisting of n-dimensional bit maps or pixel maps.

22. What is Temporal Database?

Temporal database store time related data .It usually stores relational data that include time related attributes. These attributes may involve several time stamps, each having different semantics.

23. What are Time-Series databases?

A Time-Series database stores sequences of values that change with time,such as data collected regarding the stock exchange.

24. What is Legacy database?

A Legacy database is a group of heterogeneous databases that combines different kinds of data systems, such as relational or object-oriented databases, hierarchical databases, network databases, spread sheets, multimedia databases or file systems.

25 What are the steps in the data mining process?

Data cleaning

Data integration

Data selection

Data transformation

Data mining

Pattern evaluation

Knowledge representation

What is Characterization?

It is a summarization of the general characteristics or features of a target class of data.

27. What is Discrimination?

It is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes.

28. What is Classification?

Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data

What are the classification of data mining system

Classification according to the kinds of databases mined

Classification according to the kinds of knowledge mined

Classification according to the kinds of techniques utilized

Classification according to the applications adapted:

30 what are the scheme of integrating data mining system with a data warehouse

No coupling

Loose coupling:

Semi tight coupling

Tight coupling

What are the issues of data mining

Mining methodology and user interaction issues

Mining different kinds of knowledge in databases:

Interactive mining of knowledge at multiple levels of abstraction

Incorporation of background knowledge

Data mining query languages and ad hoc data mining

Presentation and visualization of data mining results

Handling noisy or incomplete data

Pattern evaluation

Performance issues:

Efficiency and scalability of data mining algorithms

Parallel, distributed, and incremental mining algorithms

Issues relating to the diversity of database types: Handling of relational and complex types of data

Mining information from heterogeneous databases and global information systems

What is data pre processing.

The real world data’s are normally noise data so before organizing the data warehouse we need to Preprocess the data

What is preprocessing technique?

Data cleaning

Data integration

Data transformation

Data reduction

Define data cleaning

Data cleaning means removing the inconsistent data or noise and collecting necessary information

Define Data integration.

Integration of multiple databases, data cubes, or files

Why we need Data transformation

Smoothing: remove noise from data

Aggregation: summarization, data cube construction

Generalization: concept hierarchy climbing

Normalization: scaled to fall within a small, specified range

Min-max normalization

Z-score normalization

Normalization by decimal scaling

Attribute/feature construction: New attributes constructed from the given ones