Home | | Data Ware Housing and Data Mining | Building a Data warehouse

Chapter: Data Warehousing and Data Mining

Building a Data warehouse

There are two reasons why organizations consider data warehousing a critical need. In other words, there are two factors that drive you to build and use data warehouse.

Building a Data warehouse


There are two reasons why organizations consider data warehousing a critical need. In other words, there are two factors that drive you to build and use data warehouse. They are:


Business factors:

Business users want to make decision quickly and correctly using all availabledata.


Technological factors:

To address the incompatibility of operational data stores


IT infrastructure is changing rapidly. Its capacity is increasing and cost is decreasing so that building a data warehouse is easy


Business considerations:

Two approaches:


Top– Down Approach   Bottom– Up Approach


Top – Down Approach


It collected enterprise wide business requirements and decided to build an enterprise data warehouse with subset data marts.


Bottom Up Approach

The data marts are integrated or combined together to form a data warehouse.


The bottom up approach helps us incrementally build the warehouse by developing and integrating data marts as and when the requirements are clear.


The advantage of using the Bottom Up approach is that they do not require high initial costs and have a faster implementation time;

Bottom up approach is more realistic but the complexity of the integration may become a


serious obstacle.



Design considerations:


In general a data warehouse data from multiple heterogeneous sources into a query database this is also one of the reasons why a data warehouse is difficult to built


Data content

The content and structure of the data warehouse are reflected in its data model.


The data model is the template that describes how information will be organized within the integrated warehouse framework.


The data warehouse data must be a detailed data. It must be formatted, cleaned up and transformed to fit the warehouse data model.


Meta data

It defines the location and contents of data in the warehouse.


Meta data is searchable by users to find definitions or subject areas.


Data distribution


Data volumes continue to grow in nature. Therefore, it becomes necessary to know how the data should be divided across multiple servers.


The data can be distributed based on the subject area, location (geographical region), or time (current, month, year).




A number of tools are available that are specifically designed to help in the implementation of the data warehouse.


These tools for defining a cleanup, data movement, end user, query, reporting and data analysis


Performance considerations


The actual performance levels are dependent and vary widely from one environment to another. it is relatively difficult to predict the performance of a typical data warehouse



The following nine-step method is followed in the design of a data warehouse:

Choosing the subject matter

Deciding what a fact table represents

Identifying and conforming the dimensions

Choosing the facts

Storing pre calculations in the fact table

Rounding out the dimension table

Choosing the duration of the db

The need to track slowly changing dimensions

Deciding the query priorities and query models


Technical considerations:

Hardware platforms


An important consideration when choosing a data warehouse server capacity for handling the high volumes of data.

It has large data and through put.


The modern server can also support large volumes and large number of flexible GUI


Data warehouse and DBMS specialization

Very large size of databases and need to process complex adhoc queries in a short time


The most important requirements for the data warehouse DBMS are performance, throughput and scalability.


Communication infrastructure


The data warehouse user requires a relatively large band width to interact with the data warehouse and retrieve a significant amount of data for analysis.


This may mean that communication networks have to be expanded and new hardware and software may have purchased.


Implementation considerations


Collect and analyze business requirements

Create a data model

Define data sources

Choose a data base technology

Choose database access and reporting tools

Choose database connectivity s/w

Choose analysis and presentation s/w

Access tools:


Data warehouse implementation relies on selecting suitable data access tools. The following lists the various type of data that can be accessed:


Simple tabular form data   Complex textual search data   Ranking data


Multivariabledata   Time series data


Graphing, charting and pivoting data

Data extraction, clean up, transformation and migration:

Timeliness of data delivery to the warehouse


The tool must have the ability to identify the particular data and that can be read by conversion tool


The tool must support flat files, indexed files since corporate data is still in this type   The tool must have the capability to merge data from multiple data stores


The tool shouldhave specification interface to indicate the data to be extracted


The tool should have the ability to read data from data dictionary   The code generated by the tool should be completely maintainable


The data warehouse database system must be able toperform loading data directly from these tools


Data replication


Data replication or data moves to place the data to a particular workgroup in a localized database.


Most companies use data replication servers to copy their most needed data to a separate database.




It is a road map to the information stores in the warehouse is metadata it defines all elements and their attributes


Data placement strategies


As a data warehouse grows, there are at least two options for data placement. One is to put some of the data in the data warehouse into another storage media.


The second option is to distribute the data in the data warehouse across multiple servers.


User levels


The users of data warehouse data can be classified on the basis of their skill level in accessing the warehouse. There are three classes of users:


Casual users: are most comfortable in retrieving info from warehouse in pre defined formats and running pre existing queries and reports.


Power Users: can use pre defined as well as user defined queries to create simple and ad hoc reports. These users can engage in drill down operations. These users may have the experience of using reporting and query tools.


Expert users: These users tend to create their own complex queries and perform standard analysis on the info they retrieve. These users have the knowledge about the use of query and report tools


Benefits of data warehousing

The benefits can be classified into two:


Tangible benefits (quantified / measureable):It includes, o Improvement in product inventory

Decrement in production cost

Improvement in selection of target markets


Enhancement in asset and liability management benefits Intangible (not easy to quantified): It includes,


Improvement  in  productivity  by  keeping  all  data  in  single  location  and


eliminating rekeying of data o Reduced redundant processing


Enhanced customer relation


Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Data Warehousing and Data Mining : Building a Data warehouse |

Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.