Characteristics of Data Warehouses
To discuss data warehouses and distinguish them from transactional
databases calls for an appropriate data model. The multidimensional data model
(explained in more detail in Section 29.3) is a good fit for OLAP and
decision-support technologies. In contrast to multidatabases, which provide
access to disjoint and usually heterogeneous databases, a data warehouse is
frequently a store of integrated data from multiple sources, processed for
storage in a multidimensional model. Unlike most transactional databases, data
warehouses typically support time-series and trend analysis, both of which
require more historical data than is generally maintained in transactional
databases.
Compared with transactional databases, data warehouses are nonvolatile.
This means that information in the data warehouse changes far less often and
may be regarded as non–real-time with periodic updating. In transactional
systems, transactions are the unit and are the agent of change to the
database; by contrast, data ware-house information is much more coarse-grained
and is refreshed according to a careful choice of refresh policy, usually
incremental. Warehouse updates are handled by the warehouse’s acquisition
component that provides all required preprocessing.
We can also describe data warehousing more generally as a collection of decision sup-port
technologies, aimed at enabling the knowledge worker (executive, manager,
analyst) to make better and faster decisions. Figure 29.1 gives an overview of
the conceptual structure of a data
warehouse. It shows the entire data warehousing process, which includes
possible cleaning and reformatting of data before loading it into the
warehouse. This process is handled by tools known as ETL (extraction,
transformation, and loading) tools. At the back end of the process, OLAP, data
mining, and DSS may generate new relevant information such as rules; this
information is shown in the figure going back into the warehouse. The figure
also shows that data sources may include files.
Data warehouses have the following distinctive characteristics:
Multidimensional conceptual view
Generic dimensionality
Unlimited dimensions and
aggregation levels
Unrestricted cross-dimensional
operations
Dynamic sparse matrix handling
Client-server architecture
Multiuser support
Accessibility
Transparency
Intuitive data manipulation
Consistent reporting performance
Flexible reporting
Because they encompass large volumes of data, data warehouses are
generally an order of magnitude (sometimes two orders of magnitude) larger than
the source databases. The sheer volume of data (likely to be in terabytes or
even petabytes) is an issue that has been dealt with through enterprise-wide
data warehouses, virtual data warehouses, and data marts:
Enterprise-wide data warehouses are huge
projects requiring massive investment
of time and resources.
Virtual data warehouses provide
views of operational databases that are materialized
for efficient access.
Data marts generally are targeted to a
subset of the organization, such as a department,
and are more tightly focused.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.