Typical Functionality of a Data Warehouse
Data warehouses exist to facilitate complex, data-intensive, and
frequent ad hoc queries. Accordingly, data warehouses must provide far greater
and more efficient query support than is demanded of transactional databases.
The data warehouse access component supports enhanced spreadsheet
functionality, efficient query processing, structured queries, ad hoc queries,
data mining, and materialized views. In particular, enhanced spreadsheet
functionality includes support for state-of-the-art spreadsheet applications
(for example, MS Excel) as well as for OLAP applications programs. These offer
preprogrammed functionalities such as the following:
Roll-up. Data is summarized with
increasing generalization (for example, weekly
to quarterly to annually).
Drill-down. Increasing levels of detail are
revealed (the complement of roll-up).
Pivot. Cross tabulation (also referred
to as rotation) is performed.
Slice and dice. Projection operations are
performed on the dimensions.
Sorting. Data is sorted by ordinal value.
Selection. Data is available by value or
range.
Derived (computed) attributes. Attributes
are computed by operations on stored
and derived values.
Because data warehouses are free from the restrictions of the
transactional environment, there is an increased efficiency in query
processing. Among the tools and techniques used are query transformation; index
intersection and union; special ROLAP (relational
OLAP) and MOLAP (multidimensional
OLAP) functions; SQL extensions;
advanced join methods; and intelligent scanning (as in piggy-backing multiple
queries).
Improved performance has also been attained with parallel processing.
Parallel server architectures include symmetric multiprocessor (SMP), cluster,
and massively parallel processing (MPP), and combinations of these.
Knowledge workers and decision makers use tools ranging from parametric
queries to ad hoc queries to data mining. Thus, the access component of the
data warehouse must provide support for structured queries (both parametric and
ad hoc). Together, these make up a managed query environment. Data mining
itself uses techniques from statistical analysis and artificial intelligence.
Statistical analysis can be performed by advanced spreadsheets, by
sophisticated statistical analysis soft-ware, or by custom-written programs.
Techniques such as lagging, moving averages, and regression analysis are also
commonly employed. Artificial intelligence techniques, which may include
genetic algorithms and neural networks, are used for classification and are
employed to discover knowledge from the data warehouse that may be unexpected
or difficult to specify in queries. (We treat data mining in detail in Chapter
28.)
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.