Data Mining—On What Kind of Data? ( Types of Data )
Relational
Databases: A database system, also called a database
management system (DBMS), consists
of a collection of interrelated data, known as a database, and a set of
software programs to manage and access the data.
A
relational database: is a collection of tables, each of which is
assigned a unique name Each table
consists of a set of attributes (columns
or fields) and usually stores a large
set of tuples (records or rows). Each tuple in a relational table
represents an object identified by a unique key
and described by a set of attribute values. A semantic data model, such as an
entity-relationship (ER) data model, is often constructed for relational
databases. An ER data model represents the database as a set of entities and
their relationships.
Data Warehouses:
A data
warehouse is a repository of information collected from multiple sources, stored under a unified
schema, and that usually resides at a single site. Data warehouses are
constructed via a process of data cleaning, data integration, data transformation,
data loading, and periodic data refreshing.
The data
are stored to provide information from a historical
perspective (such as from the past 5–10 years) and are typically summarized.
A data
warehouse is usually modeled by a multidimensional database structure, where
each dimension corresponds to an attribute or a set of attributes in the
schema, and each cell stores the value of some aggregate measure, such as count or sales amount
The
actual physical structure of a data warehouse may be a relational data store or
a multidimensional data cube. A data cube provides a multidimensional view of
data and allows the precomputation and fast accessing of summarized data
What is
the difference between a data warehouse and a data mart?” you may
ask.
A data
warehouse collects information about subjects that span an entire organization, and thus its scope is enterprise-wide.
A data
mart, on the
other hand, is a department subset of a data warehouse. It focuses on selected subjects, and thus its scope
is department-wide. Data warehouse
systems are well suited for on-line analytical processing, or OLAP. OLAP
operations use background knowledge regarding the domain of the data being
studied in order to allow the presentation of data at different levels of abstraction. Such operations accommodate
different user viewpoints.
Examples
of OLAP operations include drill-down and roll-up, which allow the user to view
the data at differing degrees of summarization,
Transactional
Databases: Transactional database consists of a file where
each record represents a
transaction. A transaction typically includes a unique transaction identity
number (trans ID) and a list of the
items making up the transaction (such as items purchased in a store).
The
transactional database may have additional tables associated with it, which
contain other information regarding the sale, such as the date of the
transaction, the customer ID number, the ID
number of
the salesperson and of the branch at which the sale occurred, and so on.
Advanced
Data and Information Systems and Advanced Applications
The new
database applications include handling spatial data (such as maps), engineering
design data (such as the design of buildings, system components, or integrated
circuits), hypertext and multimedia data (including text, image, video, and
audio data), time-related data (such as historical records or stock exchange
data), stream data (such as video surveillance and sensor data, where data flow
in and out like streams), and the WorldWideWeb (a huge, widely distributed
information repository made available by the Internet).
These
applications require efficient data structures and scalable methods for
handling complex object structures; variable-length records; semi structured or
unstructured data; text, spatiotemporal, and multimedia data; and database
schemas with complex structures and dynamic changes.
Object-Relational
Databases:Object-relational databases are constructed based on an object-relational data model. This
model extends the relational model by providing a rich data type for handling
complex objects and object orientation object-relational databases are becoming
increasingly popular in industry and applications.
The
object-relational data model inherits the essential concepts of object-oriented
databases Each object has associated with it the following:
A set of variables that
describe the objects. These correspond to attributes in the entity relationship
and
relational models.
A set of messages that the
object can use to communicate with other objects, or with the rest of the database system.
A set of methods, where
each method holds the code to implement a message. Upon receiving a message, the method returns a value in response. For
instance, the method for the message get photo(employee) will retrieve and return a
photo of the given employee object.
Objects
that share a common set of properties can be grouped into an object class. Each
object is an instance of its class. Object classes can be organized into
class/subclass hierarchies so that each class represents properties that are
common to objects in that class
Temporal
Databases, Sequence Databases, and Time-Series Databases
A temporal database typically
stores relational data that include time-related attributes. These attributes may involve several
timestamps, each having different semantics.
A sequence database stores
sequences of ordered events, with or without a concrete notion of time. Examples include customer
shopping sequences, Web click streams, and biological sequences. A time series
database stores sequences of values or events obtained over repeated
measurements of time (e.g., hourly, daily, weekly). Examples include data
collected from the stock exchange, inventory control, and the observation of
natural phenomena (like temperature and wind).
Spatial
Databases and Spatiotemporal Databases
Spatial databases contain
spatial-related information. Examples include geographic (map) databases, very large-scale
integration (VLSI) or computed-aided design databases, and medical and
satellite image databases.
Spatial
data may be represented in raster format, consisting of n-dimensional bit maps or pixel maps. For example, a 2-D satellite
image may be represented as raster data, where each pixel registers the rainfall
in a given area. Maps can be represented in vector format, where roads,
bridges, buildings, and lakes are represented as unions or overlays of basic
geometric constructs, such as points, lines, polygons, and the partitions and
networks formed by these components.
“What kind of data mining can be
performed on spatial databases?” you may ask. Data mining may uncover patterns describing the
characteristics of houses located near a specified kind of location, such as a
park, for instance. A spatial database that stores spatial objects that change
with time is called a spatiotemporal database, from which interesting
information can be mined
Text
Databases and Multimedia Databases
Text databases are
databases that contain word descriptions for objects. These word descriptions are usually not simple
keywords but rather long sentences or paragraphs, such as product
specifications, error or bug reports, warning messages, summary reports, notes,
or other documents.
Text databases may be highly unstructured (such as some Web pages on the WorldWideWeb). Some text databases may be somewhat structured, that is, semistructured (such as e-mail messages and many HTML/XML Web pages), whereas others are relatively well structured (such as library catalogue databases). Text databases with highly regular structures typically can be implemented using relational database systems.
“What can data mining on text
databases uncover?” By mining text data, one may uncover general and concise descriptions of the
text documents, keyword or content associations, as well as the clustering
behavior of text objects.
Multimedia databases store
image, audio, and video data. They are used in applications such as picture content-based retrieval,
voice-mail systems, video-on-demand systems, the World Wide Web, and
speech-based user interfaces that recognize spoken commands. Multimedia
databases must support large objects, because data objects such as video can
require gigabytes of storage. Specialized storage and search techniques are also
required. Because video and audio data require real-time retrieval at a steady
and predetermined rate in order to avoid picture or sound gaps and system
buffer overflows, such data are referred to as continuous-media data.
Heterogeneous Databases and Legacy
Databases
A heterogeneous database consists
of a set of interconnected, autonomous component databases. The components communicate in order to exchange
information and answer queries. Objects in one component database may differ
greatly from objects in other component databases, making it difficult to
assimilate their semantics into the overall heterogeneous database.
A legacy database is a
group of heterogeneous databases that
combines different kinds of data systems,
such as relational or object-oriented databases, hierarchical databases,
network databases, spreadsheets, multimedia databases, or file systems. The
heterogeneous databases in a legacy database may be connected by intra or
inter-computer networks.
Data
Streams
Many
applications involve the generation and analysis of a new kind of data, called
stream data, where data flow in and out of an observation platform (or window)
dynamically. Such data streams have the following unique features: huge or possibly infinite volume,
dynamically changing, flowing in and
out in a fixed order, allowing only one or a small number of scans, and
demanding fast (often real-time) response time.
Typical
examples of data streams include various kinds of scientific and engineering
data, time-series data, and data produced in other dynamic environments, such
as power supply, network traffic, stock exchange, telecommunications, Web click
streams, video surveillance, and weather or environment monitoring.
Mining
data streams involves the efficient discovery of general patterns and dynamic
changes within stream data.
The World
Wide Web
The World
Wide Web and its associated distributed information services, such as Yahoo!,
Google, America Online, and AltaVista, provide rich, worldwide, on-line
information services, where data objects are linked together to facilitate
interactive access. Users seeking information of interest traverse from one
object via links to another. Such systems provide ample opportunities and
challenges for data mining.
For
example, understanding user access patterns will not only help improve system
design (by providing efficient access between highly correlated objects), but
also leads to better marketing decisions (e.g., by placing advertisements in
frequently visited documents, or by providing better customer/user
classification and behavior analysis). Capturing user access patterns in such
distributed information environments is called Web usage mining (or Weblog
mining).
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.