Home | | Data Ware Housing and Data Mining | Building a Data warehouse

Chapter: Data Warehousing and Data Mining

Building a Data warehouse

There are two reasons why organizations consider data warehousing a critical need. In other words, there are two factors that drive you to build and use data warehouse.

Building a Data warehouse

 

There are two reasons why organizations consider data warehousing a critical need. In other words, there are two factors that drive you to build and use data warehouse. They are:

 

Business factors:

Business users want to make decision quickly and correctly using all availabledata.

 

Technological factors:

To address the incompatibility of operational data stores

 

IT infrastructure is changing rapidly. Its capacity is increasing and cost is decreasing so that building a data warehouse is easy

 

Business considerations:

Two approaches:

 

Top– Down Approach   Bottom– Up Approach

 

Top – Down Approach

 

It collected enterprise wide business requirements and decided to build an enterprise data warehouse with subset data marts.

 

Bottom Up Approach

The data marts are integrated or combined together to form a data warehouse.

 

The bottom up approach helps us incrementally build the warehouse by developing and integrating data marts as and when the requirements are clear.

 

The advantage of using the Bottom Up approach is that they do not require high initial costs and have a faster implementation time;

Bottom up approach is more realistic but the complexity of the integration may become a

 

serious obstacle.

Ø

 

Design considerations:

 

In general a data warehouse data from multiple heterogeneous sources into a query database this is also one of the reasons why a data warehouse is difficult to built

 

Data content

The content and structure of the data warehouse are reflected in its data model.

 

The data model is the template that describes how information will be organized within the integrated warehouse framework.

 

The data warehouse data must be a detailed data. It must be formatted, cleaned up and transformed to fit the warehouse data model.

 

Meta data

It defines the location and contents of data in the warehouse.

 

Meta data is searchable by users to find definitions or subject areas.

 

Data distribution

 

Data volumes continue to grow in nature. Therefore, it becomes necessary to know how the data should be divided across multiple servers.

 

The data can be distributed based on the subject area, location (geographical region), or time (current, month, year).

 

Tools

 

A number of tools are available that are specifically designed to help in the implementation of the data warehouse.

 

These tools for defining a cleanup, data movement, end user, query, reporting and data analysis

 

Performance considerations

 

The actual performance levels are dependent and vary widely from one environment to another. it is relatively difficult to predict the performance of a typical data warehouse

 

 

The following nine-step method is followed in the design of a data warehouse:

Choosing the subject matter

Deciding what a fact table represents

Identifying and conforming the dimensions

Choosing the facts

Storing pre calculations in the fact table

Rounding out the dimension table

Choosing the duration of the db

The need to track slowly changing dimensions

Deciding the query priorities and query models

 

Technical considerations:

Hardware platforms

 

An important consideration when choosing a data warehouse server capacity for handling the high volumes of data.

It has large data and through put.

 

The modern server can also support large volumes and large number of flexible GUI

 

Data warehouse and DBMS specialization

Very large size of databases and need to process complex adhoc queries in a short time

 

The most important requirements for the data warehouse DBMS are performance, throughput and scalability.

 

Communication infrastructure

 

The data warehouse user requires a relatively large band width to interact with the data warehouse and retrieve a significant amount of data for analysis.

 

This may mean that communication networks have to be expanded and new hardware and software may have purchased.

 

Implementation considerations

 

Collect and analyze business requirements

Create a data model

Define data sources

Choose a data base technology

Choose database access and reporting tools

Choose database connectivity s/w

Choose analysis and presentation s/w

Access tools:

 

Data warehouse implementation relies on selecting suitable data access tools. The following lists the various type of data that can be accessed:

 

Simple tabular form data   Complex textual search data   Ranking data

 

Multivariabledata   Time series data

 

Graphing, charting and pivoting data

Data extraction, clean up, transformation and migration:

Timeliness of data delivery to the warehouse

 

The tool must have the ability to identify the particular data and that can be read by conversion tool

 

The tool must support flat files, indexed files since corporate data is still in this type   The tool must have the capability to merge data from multiple data stores

 

The tool shouldhave specification interface to indicate the data to be extracted

 

The tool should have the ability to read data from data dictionary   The code generated by the tool should be completely maintainable

 

The data warehouse database system must be able toperform loading data directly from these tools

 

Data replication

 

Data replication or data moves to place the data to a particular workgroup in a localized database.

 

Most companies use data replication servers to copy their most needed data to a separate database.

 

Metadata

 

It is a road map to the information stores in the warehouse is metadata it defines all elements and their attributes

 

Data placement strategies

 

As a data warehouse grows, there are at least two options for data placement. One is to put some of the data in the data warehouse into another storage media.

 

The second option is to distribute the data in the data warehouse across multiple servers.

 

User levels

 

The users of data warehouse data can be classified on the basis of their skill level in accessing the warehouse. There are three classes of users:

 

Casual users: are most comfortable in retrieving info from warehouse in pre defined formats and running pre existing queries and reports.

 

Power Users: can use pre defined as well as user defined queries to create simple and ad hoc reports. These users can engage in drill down operations. These users may have the experience of using reporting and query tools.

 

Expert users: These users tend to create their own complex queries and perform standard analysis on the info they retrieve. These users have the knowledge about the use of query and report tools

 

Benefits of data warehousing

The benefits can be classified into two:

 

Tangible benefits (quantified / measureable):It includes, o Improvement in product inventory

Decrement in production cost

Improvement in selection of target markets

 

Enhancement in asset and liability management benefits Intangible (not easy to quantified): It includes,

 

Improvement  in  productivity  by  keeping  all  data  in  single  location  and

 

eliminating rekeying of data o Reduced redundant processing

 

Enhanced customer relation

 

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Data Warehousing and Data Mining : Building a Data warehouse |


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.