Building a Data warehouse
There are two reasons why organizations consider data warehousing a critical need. In other words, there are two factors that drive you to build and use data warehouse. They are:
Business users want to make decision quickly and correctly using all availabledata.
To address the incompatibility of operational data stores
IT infrastructure is changing rapidly. Its capacity is increasing and cost is decreasing so that building a data warehouse is easy
Top– Down Approach Bottom– Up Approach
Top – Down Approach
It collected enterprise wide business requirements and decided to build an enterprise data warehouse with subset data marts.
Bottom Up Approach
The data marts are integrated or combined together to form a data warehouse.
The bottom up approach helps us incrementally build the warehouse by developing and integrating data marts as and when the requirements are clear.
The advantage of using the Bottom Up approach is that they do not require high initial costs and have a faster implementation time;
Bottom up approach is more realistic but the complexity of the integration may become a
In general a data warehouse data from multiple heterogeneous sources into a query database this is also one of the reasons why a data warehouse is difficult to built
The content and structure of the data warehouse are reflected in its data model.
The data model is the template that describes how information will be organized within the integrated warehouse framework.
The data warehouse data must be a detailed data. It must be formatted, cleaned up and transformed to fit the warehouse data model.
It defines the location and contents of data in the warehouse.
Meta data is searchable by users to find definitions or subject areas.
Data volumes continue to grow in nature. Therefore, it becomes necessary to know how the data should be divided across multiple servers.
The data can be distributed based on the subject area, location (geographical region), or time (current, month, year).
A number of tools are available that are specifically designed to help in the implementation of the data warehouse.
These tools for defining a cleanup, data movement, end user, query, reporting and data analysis
The actual performance levels are dependent and vary widely from one environment to another. it is relatively difficult to predict the performance of a typical data warehouse
The following nine-step method is followed in the design of a data warehouse:
Choosing the subject matter
Deciding what a fact table represents
Identifying and conforming the dimensions
Choosing the facts
Storing pre calculations in the fact table
Rounding out the dimension table
Choosing the duration of the db
The need to track slowly changing dimensions
Deciding the query priorities and query models
An important consideration when choosing a data warehouse server capacity for handling the high volumes of data.
It has large data and through put.
The modern server can also support large volumes and large number of flexible GUI
Data warehouse and DBMS specialization
Very large size of databases and need to process complex adhoc queries in a short time
The most important requirements for the data warehouse DBMS are performance, throughput and scalability.
The data warehouse user requires a relatively large band width to interact with the data warehouse and retrieve a significant amount of data for analysis.
This may mean that communication networks have to be expanded and new hardware and software may have purchased.
Collect and analyze business requirements
Create a data model
Define data sources
Choose a data base technology
Choose database access and reporting tools
Choose database connectivity s/w
Choose analysis and presentation s/w
Data warehouse implementation relies on selecting suitable data access tools. The following lists the various type of data that can be accessed:
Simple tabular form data Complex textual search data Ranking data
Multivariabledata Time series data
Graphing, charting and pivoting data
Data extraction, clean up, transformation and migration:
Timeliness of data delivery to the warehouse
The tool must have the ability to identify the particular data and that can be read by conversion tool
The tool must support flat files, indexed files since corporate data is still in this type The tool must have the capability to merge data from multiple data stores
The tool shouldhave specification interface to indicate the data to be extracted
The tool should have the ability to read data from data dictionary The code generated by the tool should be completely maintainable
The data warehouse database system must be able toperform loading data directly from these tools
Data replication or data moves to place the data to a particular workgroup in a localized database.
Most companies use data replication servers to copy their most needed data to a separate database.
It is a road map to the information stores in the warehouse is metadata it defines all elements and their attributes
Data placement strategies
As a data warehouse grows, there are at least two options for data placement. One is to put some of the data in the data warehouse into another storage media.
The second option is to distribute the data in the data warehouse across multiple servers.
The users of data warehouse data can be classified on the basis of their skill level in accessing the warehouse. There are three classes of users:
Casual users: are most comfortable in retrieving info from warehouse in pre defined formats and running pre existing queries and reports.
Power Users: can use pre defined as well as user defined queries to create simple and ad hoc reports. These users can engage in drill down operations. These users may have the experience of using reporting and query tools.
Expert users: These users tend to create their own complex queries and perform standard analysis on the info they retrieve. These users have the knowledge about the use of query and report tools
Benefits of data warehousing
The benefits can be classified into two:
Tangible benefits (quantified / measureable):It includes, o Improvement in product inventory
Decrement in production cost
Improvement in selection of target markets
Enhancement in asset and liability management benefits Intangible (not easy to quantified): It includes,
Improvement in productivity by keeping all data in single location and
eliminating rekeying of data o Reduced redundant processing
Enhanced customer relation