Home | | Data Ware Housing and Data Mining | Important Short Questions and Answers : Data Mining

Chapter: Data Warehousing and Data Mining

Important Short Questions and Answers : Data Mining

Data Warehousing and Data Mining - Important Short Questions and Answers : Data Mining

DATA MINING

 

Define data mining

 

Data mining is a process of extracting or mining knowledge from huge amount of data.

 

Define pattern evaluation

 

Pattern evaluation is used to identify the truly interesting patterns representing knowledge based on some interesting measures.

 

Define knowledge representation

 

Knowledge representation techniques are used to present the mined knowledge to the user.

 

List the five primitives for specification of a data mining task.

task-relevant data

kind of knowledge to be mined

background knowledge

interestingness measures

 

 

knowledge presentation and visualization techniques to be used for displaying the discovered patterns

 

What is Visualization?

 

Visualization is for depiction of data and to gain intuition about data being observed. It assists the analysts in selecting display formats, viewer perspectives and data representation schema                         

Mention some of the application areas of data mining

DNA analysis    

Market analysis

Financial data analysis

Banking industry

Retail Industry   

Health care analysis.

Telecommunication industry

 

Define data cleaning

 

Data cleaning means removing the inconsistent data or noise and collecting necessary information

 

Define Data integration.

Integration of multiple databases, data cubes, or files

 

 

Why we need Data transformation

 

Smoothing: remove noise from data min-max normalization         

Aggregation: summarization, data z-score normalization cube construction normalization by decimal scaling   

Generalization: concept hierarchy Attribute/feature construction climbing

Normalization: scaled to fall within a        New attributes constructed from the   given ones small, specified range

 

Define Data reduction.

 

Data reduction Obtains reduced representation in volume but produces the same or similar analytical results.

 

What is meant by Data discretization

 

It can be defined as Part of data reduction but with particular importance, especially for numerical data

 

What is the discretization processes involved in data preprocessing?

 

It reduces the number of values for a given continuous attribute by dividing the range of the attribute into intervals. Interval labels can then be used to replace actual data values.

 

Define Concept hierarchy.

 

It reduce the data by collecting and replacing low level concepts (such as numeric values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).

 

Why we need data preprocessing. Data in the real world is dirty

 

incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data

 

noisy: containing errors or outliers

inconsistent: containing discrepancies in codes or names

 

Give some data mining tools.

DBMiner

GeoMiner

Multimedia miner

WeblogMiner

 

Describe the use of DBMiner.

 

Used to perform data mining functions, including characterization, association, classification, prediction and clustering.

 

Applications of DBMiner.

 

The DBMiner system can be used as a general-purpose online analytical mining system for both OLAP and data mining in relational database and datawarehouses.Used in medium to large relational databases with fast response time.

 

What are the types of knowledge to be mined?

 

Characterization

Discrimination

Association

Classification

prediction

 

Clustering

Outlier analysis

Other data mining tasks

 

Define Relational databases.

 

A relational database is a collection of tables, each of which is assigned a unique name. Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples(records or rows).Each tuple in a relational table represents an object identified by a unique key and described by a set of attribute values.

 

Define Transactional Databases.

 

A transactional database consists of a file where each record represents a transaction. A transaction typically includes a unique transaction identity number (trans_ID), and a list of the items making up the transaction.

 

21. Define Spatial Databases.

 

Spatial databases contain spatial-related information. Such databases include geographic (map) databases, VLSI chip design databases, and medical and satellite image databases. Spatial data may be represented in raster format, consisting of n-dimensional bit maps or pixel maps.

 

22. What is Temporal Database?

 

Temporal database store time related data .It usually stores relational data that include time related attributes. These attributes may involve several time stamps, each having different semantics.

 

23. What are Time-Series databases?

 

A Time-Series database stores sequences of values that change with time,such as data collected regarding the stock exchange.

 

24. What is Legacy database?

 

A Legacy database is a group of heterogeneous databases that combines different kinds of data systems, such as relational or object-oriented databases, hierarchical databases, network databases, spread sheets, multimedia databases or file systems.

 

25 What are the steps in the data mining process?

Data cleaning

 

Data integration

 

Data selection

 

Data transformation

 

Data mining

 

Pattern evaluation

 

Knowledge representation

 

What is Characterization?

It is a summarization of the general characteristics or features of a target class of data.

 

27. What is Discrimination?

 

It is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes.

 

28. What is Classification?

 

Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data

 

What are the classification of data mining system

 

Classification according to the kinds of databases mined

Classification according to the kinds of knowledge mined

Classification according to the kinds of techniques utilized

Classification according to the applications adapted:

 

30 what are the scheme of integrating data mining system with a data warehouse

 

No coupling

Loose coupling:

 

Semi tight coupling

Tight coupling

 

What are the issues of data mining

 

Mining methodology and user interaction issues

Mining different kinds of knowledge in databases:

 

Interactive mining of knowledge at multiple levels of abstraction

Incorporation of background knowledge

Data mining query languages and ad hoc data mining

Presentation and visualization of data mining results

Handling noisy or incomplete data

Pattern evaluation

 

Performance issues:

Efficiency and scalability of data mining algorithms

 

Parallel, distributed, and incremental mining algorithms

 

Issues relating to the diversity of database types: Handling of relational and complex types of data

 

Mining information from heterogeneous databases and global information systems

 

What is data pre processing.

 

The real world data’s are normally noise data so before organizing the data warehouse we need to Preprocess the data

 

What is preprocessing technique?

 

Data cleaning

Data integration

 

Data transformation

Data reduction

 

Define data cleaning

 

Data cleaning means removing the inconsistent data or noise and collecting necessary information

 

Define Data integration.

Integration of multiple databases, data cubes, or files

 

Why we need Data transformation

 

Smoothing: remove noise from data

Aggregation: summarization, data cube construction

Generalization: concept hierarchy climbing

Normalization: scaled to fall within a small, specified range

Min-max normalization

Z-score normalization

Normalization by decimal scaling

 

Attribute/feature construction: New attributes constructed from the given ones

 

Define Data reduction.

 

Data reduction Obtains reduced representation in volume but produces the same or similar analytical results.

 

What is meant by Data discretization

 

It can be defined as Part of data reduction but with particular importance, especially for numerical data

 

What is the discretization processes involved in data preprocessing?

 

It reduces the number of values for a given continuous attribute by dividing the range of the attribute into intervals. Interval labels can then be used to replace actual data values.

 

Define Concept hierarchy.

 

It reduce the data by collecting and replacing low level concepts (such as numeric values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).

 

Why we need data preprocessing. Data in the real world is dirty

 

incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data

 

noisy: containing errors or outliers

inconsistent: containing discrepancies in codes or names


Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Data Warehousing and Data Mining : Important Short Questions and Answers : Data Mining |


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.