Data
Warehouse Introduction
A data warehouse is a collection of data marts representing historical
data from different operations in the company.
It collect the data from multiple heterogeneous data base files(flat,
text and etc).
It store the 5 to 10 years of huge amount of data.
This data is stored in a structure optimized for querying and data
analysis as a data warehouse.
―A warehouse is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of management’s decision making
process‖.
Subject
Oriented: Data that gives information about
a particular subject instead of about a company’s ongoing operations.
Integrated:
Data that
is gathered into the data warehouse from a variety of sources and merged into a coherent
whole.
Time-variant: All data
in the data warehouse is identified with a particular time period.
Non-volatile: Data is stable in a data warehouse. More data is added but
data is never removed. It can be
Used for
decision Support Used to manage and
control business
Used by
managers and end-users to understand the business and make judgments
Other important terminology
Enterprise
Data warehouse: It collects all information about
subjects (customers, products, sales, assets, personnel) that span the
entire organization
Decision
Support System (DSS): Information technology to help
the knowledge worker (executive, manager, and analyst) makes faster
& better decisions
Operational and informational
Data
Operational
Data:
Focusing
on transactional function such asbank card withdrawals and deposits Detailed
Updateable Reflects current data
Informational
Data:
Focusing
on providing answers to problems posed by decision makers Summarized
Non
updateable
Data Warehouse Characteristics
It is a
database designed for analytical tasks Its content is periodically updated
It
contains current and historical data to provide a historical perspective of
information.
Data warehouse Architecture and
its seven components
Overall Architecture
The data warehouse architecture is based on the
data base management system server.
The central information repository is surrounded by
number of key components
Data warehouse is an environment, not a product
which is based on relational database management system that functions as the
central repository for informational data.
The data entered into the data warehouse
transformed into an integrated structure and format. The transformation process
involves conversion, summarization, filtering and condensation.
The data warehouse must be capable of holding and
managing large volumes of data as well as different structure of data
structures over the time.
Key components
1. Data sourcing, cleanup, transformation, and
migration tools
2. Metadata repository
3. Warehouse/database technology
4. Data marts
5. Data query, reporting, analysis, and mining
tools
6. Data warehouse administration and management
7. Information delivery system
1. Data warehouse database
This is
the central part of the data warehousing environment.
This is implemented
based on RDBMS technology.
2 Sourcing, Acquisition, Clean
up, and Transformation Tools
They perform conversions, summarization, key changes, structural changes
The data transformation is required to use by decision support tools.
The transformation produces programs, control statements.
It moves the data into data warehouse from multiple operational systems.
The functionalities of these tools are listed below:
To remove unwanted data from
operational db
Converting to common data names and attributes
Calculating summaries and derived
data
Establishing defaults for missing data
Accommodating source data
definition changes
3. Meta data
It is
data about data. It is used for maintaining, managing and using the data
warehouse. It is classified into two:
Technical
Meta data: It contains information about data warehouse data
used by warehouse designer, administrator to carry out
development and management tasks. It includes,
Info
about data stores
Transformation
descriptions. That si mapping methods from operational db to warehouse db
Warehouse
Object and data structure definitions for target data The rules used to perform clean up, and data
enhancement Data mapping operations
Access
authorization, backup history, archive history, info delivery history, data
acquisition history, data access etc.,
Business
Meta data: It contains info that gives info stored in data
warehouse to users. It includes,
Subject
areas, and info object type including queries, reports, images, video, audio
clips etc. Internet home pages
Info
related to info delivery system
Data
warehouse operational info such as ownerships, audit trails etc.,
Meta data
helps the users to understand content and find the data. Meta data are stored
in a separate data stores which is known as informational directory or Meta data repository which helps to
integrate, maintain and view the contents of the data warehouse.
The
following lists the characteristics of info directory/ Meta data:
It is the
gateway to the datawarehouse environment
It
supports easy distribution and replication of content for high performance and
availability It should be searchable by
business oriented key words
It should
act as a launch platform for end user to access data and analysistools It should support the sharing of info
It should
support scheduling options for request
It should
support and provide interface to other applications
It should
support end user monitoring of the status of the data warehouse environment
4 Access tools
Its
purpose is to provide info to business users for decision making. There are
five categories:
Data
query and reporting tools Application
development tools Executive info system
tools (EIS) OLAP tools
Data
mining tools
Query and
reporting tools: used to generate query and report. There are two
types of reporting tools. They are:
Production
reporting tool used to generate regular operational reports Desktop report writer are inexpensive
desktop tools designed for end users.
Managed
Query tools: used to generate SQL query. It uses Meta layer
software in between users and databases which offers a
point-and-click creation of SQL statement.
Application
development tools: This is a graphical data access environment which
integrates OLAP tools with data warehouse and can be used to access all db
systems.
OLAP
Tools: Are used to analyze the data in multi dimensional
and complex views. Data mining tools: are used to discover knowledge from the
data warehouse data
5 Data marts
It is
inexpensive tool and alternative to the data ware house. it based on the
subject area
. Data
mart is used in the following situation:
Extremely
urgent user requirement
The
absence of a budget for a full scale data warehouse strategy The decentralization of business needs
6. Data warehouse admin and
management
The
management of data warehouse includes,
Security
and priority management Monitoring
updates from multiple sources Data
quality checks
Managing
and updating meta data
Auditing
and reporting data warehouse usage and status
Purging
data
Replicating,
sub setting and distributing data
Backup and recovery
Data
warehouse storage management which includes capacity planning, hierarchical
storage management and purging of aged data etc.,
7 Information delivery system
It is used to enable the process of subscribing for
data warehouse info.
Delivery to one or more destinations according to
specified scheduling algorithm
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.