Building
a Data warehouse
There are
two reasons why organizations consider data warehousing a critical need. In
other words, there are two factors that drive you to build and use data
warehouse. They are:
Business
factors:
Business
users want to make decision quickly and correctly using all availabledata.
Technological
factors:
To
address the incompatibility of operational data stores
IT
infrastructure is changing rapidly. Its capacity is increasing and cost is
decreasing so that building a data warehouse is easy
Business considerations:
Two
approaches:
Top– Down
Approach Bottom– Up Approach
Top –
Down Approach
It
collected enterprise wide business requirements and decided to build an
enterprise data warehouse with subset data marts.
Bottom Up
Approach
The data marts are integrated or combined together to form a data
warehouse.
The bottom up approach helps us incrementally build the warehouse by
developing and integrating data marts as and when the requirements are clear.
The advantage of using the Bottom Up approach is that they do not
require high initial costs and have a faster implementation time;
Bottom up approach is more realistic but the complexity of the
integration may become a
serious
obstacle.
Ø
Design considerations:
In
general a data warehouse data from multiple heterogeneous sources into a query
database this is also one of the reasons why a data warehouse is difficult to
built
Data
content
The content and structure of the data warehouse are reflected in its
data model.
The data model is the template that describes how information will be
organized within the integrated warehouse framework.
The data warehouse data must be a detailed data. It must be formatted,
cleaned up and transformed to fit the warehouse data model.
Meta data
It defines the location and contents of data in the warehouse.
Meta data is searchable by users to find definitions or subject areas.
Data
distribution
Data volumes continue to grow in nature. Therefore, it becomes necessary
to know how the data should be divided across multiple servers.
The data can be distributed based on the subject area, location
(geographical region), or time (current, month, year).
Tools
A number
of tools are available that are specifically designed to help in the
implementation of the data warehouse.
These tools for defining a cleanup, data movement,
end user, query, reporting and data analysis
Performance considerations
The
actual performance levels are dependent and vary widely from one environment to
another. it is relatively difficult to predict the performance of a typical
data warehouse
The
following nine-step method is followed in the design of a data warehouse:
Choosing
the subject matter
Deciding
what a fact table represents
Identifying
and conforming the dimensions
Choosing
the facts
Storing
pre calculations in the fact table
Rounding
out the dimension table
Choosing
the duration of the db
The need
to track slowly changing dimensions
Deciding
the query priorities and query models
Technical considerations:
Hardware
platforms
An important consideration when choosing a data
warehouse server capacity for handling the high volumes of data.
It has large data and through put.
The modern server can also support large volumes
and large number of flexible GUI
Data
warehouse and DBMS specialization
Very large size of databases and need to process
complex adhoc queries in a short time
The most important requirements for the data
warehouse DBMS are performance, throughput and scalability.
Communication
infrastructure
The data warehouse user requires a relatively large
band width to interact with the data warehouse and retrieve a significant
amount of data for analysis.
This may
mean that communication networks have to be expanded and new hardware and
software may have purchased.
Implementation considerations
Collect
and analyze business requirements
Create a
data model
Define
data sources
Choose a
data base technology
Choose
database access and reporting tools
Choose
database connectivity s/w
Choose
analysis and presentation s/w
Access
tools:
Data
warehouse implementation relies on selecting suitable data access tools. The
following lists the various type of data that can be accessed:
Simple
tabular form data Complex textual
search data Ranking data
Multivariabledata Time series data
Graphing,
charting and pivoting data
Data
extraction, clean up, transformation and migration:
Timeliness
of data delivery to the warehouse
The tool
must have the ability to identify the particular data and that can be read by
conversion tool
The tool
must support flat files, indexed files since corporate data is still in this
type The tool must have the capability
to merge data from multiple data stores
The tool
shouldhave specification interface to indicate the data to be extracted
The tool
should have the ability to read data from data dictionary The code generated by the tool should be
completely maintainable
The data
warehouse database system must be able toperform loading data directly from
these tools
Data
replication
Data replication or data moves to place the data to
a particular workgroup in a localized database.
Most companies use data replication servers to copy
their most needed data to a separate database.
Metadata
It is a
road map to the information stores in the warehouse is metadata it defines all
elements and their attributes
Data
placement strategies
As a data warehouse grows, there are at least two options for data
placement. One is to put some of the data in the data warehouse into another
storage media.
The second option is to distribute the data in the data warehouse across
multiple servers.
User
levels
The users
of data warehouse data can be classified on the basis of their skill level in
accessing the warehouse. There are three classes of users:
Casual users: are most
comfortable in retrieving info from warehouse in pre defined formats and running pre existing
queries and reports.
Power Users: can use
pre defined as well as user defined queries to create simple and ad hoc reports. These users can engage in
drill down operations. These users may have the experience of using reporting
and query tools.
Expert users: These
users tend to create their own complex queries and perform standard analysis on the info they retrieve.
These users have the knowledge about the use of query and report tools
Benefits of data warehousing
The
benefits can be classified into two:
Tangible
benefits (quantified / measureable):It includes, o Improvement in product inventory
Decrement
in production cost
Improvement
in selection of target markets
Enhancement
in asset and liability management benefits Intangible (not easy
to quantified): It includes,
Improvement in
productivity by keeping
all data in
single location and
eliminating
rekeying of data o Reduced redundant processing
Enhanced
customer relation
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.