Data Warehouse Introduction
A data warehouse is a collection of data marts representing historical data from different operations in the company.
It collect the data from multiple heterogeneous data base files(flat, text and etc).
It store the 5 to 10 years of huge amount of data.
This data is stored in a structure optimized for querying and data analysis as a data warehouse.
―A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process‖.
Subject Oriented: Data that gives information about a particular subject instead of about a company’s ongoing operations.
Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
Time-variant: All data in the data warehouse is identified with a particular time period. Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed. It can be
Used for decision Support Used to manage and control business
Used by managers and end-users to understand the business and make judgments
Other important terminology
Enterprise Data warehouse: It collects all information about subjects (customers, products, sales, assets, personnel) that span the entire organization
Decision Support System (DSS): Information technology to help the knowledge worker (executive, manager, and analyst) makes faster & better decisions
Operational and informational Data
Focusing on transactional function such asbank card withdrawals and deposits Detailed
Updateable Reflects current data
Focusing on providing answers to problems posed by decision makers Summarized
Data Warehouse Characteristics
It is a database designed for analytical tasks Its content is periodically updated
It contains current and historical data to provide a historical perspective of information.
Data warehouse Architecture and its seven components
The data warehouse architecture is based on the data base management system server.
The central information repository is surrounded by number of key components
Data warehouse is an environment, not a product which is based on relational database management system that functions as the central repository for informational data.
The data entered into the data warehouse transformed into an integrated structure and format. The transformation process involves conversion, summarization, filtering and condensation.
The data warehouse must be capable of holding and managing large volumes of data as well as different structure of data structures over the time.
1. Data sourcing, cleanup, transformation, and migration tools
2. Metadata repository
3. Warehouse/database technology
4. Data marts
5. Data query, reporting, analysis, and mining tools
6. Data warehouse administration and management
7. Information delivery system
1. Data warehouse database
This is the central part of the data warehousing environment.
This is implemented based on RDBMS technology.
2 Sourcing, Acquisition, Clean up, and Transformation Tools
They perform conversions, summarization, key changes, structural changes
The data transformation is required to use by decision support tools.
The transformation produces programs, control statements.
It moves the data into data warehouse from multiple operational systems. The functionalities of these tools are listed below:
To remove unwanted data from operational db
Converting to common data names and attributes
Calculating summaries and derived data
Establishing defaults for missing data
Accommodating source data definition changes
3. Meta data
It is data about data. It is used for maintaining, managing and using the data warehouse. It is classified into two:
Technical Meta data: It contains information about data warehouse data used by warehouse designer, administrator to carry out development and management tasks. It includes,
Info about data stores
Transformation descriptions. That si mapping methods from operational db to warehouse db
Warehouse Object and data structure definitions for target data The rules used to perform clean up, and data enhancement Data mapping operations
Access authorization, backup history, archive history, info delivery history, data acquisition history, data access etc.,
Business Meta data: It contains info that gives info stored in data warehouse to users. It includes,
Subject areas, and info object type including queries, reports, images, video, audio clips etc. Internet home pages
Info related to info delivery system
Data warehouse operational info such as ownerships, audit trails etc.,
Meta data helps the users to understand content and find the data. Meta data are stored in a separate data stores which is known as informational directory or Meta data repository which helps to integrate, maintain and view the contents of the data warehouse.
The following lists the characteristics of info directory/ Meta data:
It is the gateway to the datawarehouse environment
It supports easy distribution and replication of content for high performance and availability It should be searchable by business oriented key words
It should act as a launch platform for end user to access data and analysistools It should support the sharing of info
It should support scheduling options for request
It should support and provide interface to other applications
It should support end user monitoring of the status of the data warehouse environment
4 Access tools
Its purpose is to provide info to business users for decision making. There are five categories:
Data query and reporting tools Application development tools Executive info system tools (EIS) OLAP tools
Data mining tools
Query and reporting tools: used to generate query and report. There are two types of reporting tools. They are:
Production reporting tool used to generate regular operational reports Desktop report writer are inexpensive desktop tools designed for end users.
Managed Query tools: used to generate SQL query. It uses Meta layer software in between users and databases which offers a point-and-click creation of SQL statement.
Application development tools: This is a graphical data access environment which integrates OLAP tools with data warehouse and can be used to access all db systems.
OLAP Tools: Are used to analyze the data in multi dimensional and complex views. Data mining tools: are used to discover knowledge from the data warehouse data
5 Data marts
It is inexpensive tool and alternative to the data ware house. it based on the subject area
. Data mart is used in the following situation:
Extremely urgent user requirement
The absence of a budget for a full scale data warehouse strategy The decentralization of business needs
6. Data warehouse admin and management
The management of data warehouse includes,
Security and priority management Monitoring updates from multiple sources Data quality checks
Managing and updating meta data
Auditing and reporting data warehouse usage and status
Replicating, sub setting and distributing data Backup and recovery
Data warehouse storage management which includes capacity planning, hierarchical storage management and purging of aged data etc.,
7 Information delivery system
It is used to enable the process of subscribing for data warehouse info.
Delivery to one or more destinations according to specified scheduling algorithm