Introduction, Definitions, and Terminology
A database as a collection of related data and a database system as a database and database software together. A data warehouse is also a collection of information as well as a supporting system. However, a clear distinction exists. Traditional databases are transactional (relational, object-oriented, network, or hierarchical). Data warehouses have the distinguishing characteristic that they are mainly intended for decision-support applications. They are optimized for data retrieval, not routine transaction processing.
Because data warehouses have been developed in numerous organizations to meet particular needs, there is no single, canonical definition of the term data warehouse. Professional magazine articles and books in the popular press have elaborated on the meaning in a variety of ways. Vendors have capitalized on the popularity of the term to help market a variety of related products, and consultants have provided a large variety of services, all under the data warehousing banner. However, data warehouses are quite distinct from traditional databases in their structure, functioning, performance, and purpose.
W. H. Inmon characterized a data warehouse as a subject-oriented, integrated, non-volatile, time-variant collection of data in support of management’s decisions. Data warehouses provide access to data for complex analysis, knowledge discovery, and decision making. They support high-performance demands on an organization’s data and information. Several types of applications—OLAP, DSS, and data mining applications—are supported. We define each of these next.
OLAP (online analytical processing) is a term used to describe the analysis of com-plex data from the data warehouse. In the hands of skilled knowledge workers, OLAP tools use distributed computing capabilities for analyses that require more storage and processing power than can be economically and efficiently located on an individual desktop.
DSS (decision-support systems), also known as EIS—executive information systems; not to be confused with enterprise integration systems—support an organization’s leading decision makers with higher-level data for complex and important decisions. Data mining (which we discussed in Chapter 28) is used for knowledge discovery, the process of searching data for unanticipated new knowledge.
Traditional databases support online transaction processing (OLTP), which includes insertions, updates, and deletions, while also supporting information query requirements. Traditional relational databases are optimized to process queries that may touch a small part of the database and transactions that deal with insertions or updates of a few tuples per relation to process. Thus, they cannot be optimized for OLAP, DSS, or data mining. By contrast, data warehouses are designed precisely to support efficient extraction, processing, and presentation for analytic and decision-making purposes. In comparison to traditional databases, data warehouses generally contain very large amounts of data from multiple sources that may include databases from different data models and sometimes files acquired from independent systems and platforms.