Home | | Data Ware Housing and Data Mining | Data Warehouse Introduction

Chapter: Data Warehousing and Data Mining

Data Warehouse Introduction

1. Data sourcing, cleanup, transformation, and migration tools 2. Metadata repository 3. Warehouse/database technology 4. Data marts 5. Data query, reporting, analysis, and mining tools 6. Data warehouse administration and management 7. Information delivery system

Data Warehouse Introduction

 

A data warehouse is a collection of data marts representing historical data from different operations in the company.

It collect the data from multiple heterogeneous data base files(flat, text and etc).

 

It store the 5 to 10 years of huge amount of data.

 

This data is stored in a structure optimized for querying and data analysis as a data warehouse.

 

―A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process‖.

 

Subject Oriented: Data that gives information about a particular subject instead of about a company’s ongoing operations.

 

Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

 

Time-variant: All data in the data warehouse is identified with a particular time period. Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed. It can be

 

Used for decision Support   Used to manage and control business

 

Used by managers and end-users to understand the business and make judgments

 

Other important terminology

 

Enterprise Data warehouse: It collects all information about subjects (customers, products, sales, assets, personnel) that span the entire organization

 

Decision Support System (DSS): Information technology to help the knowledge worker (executive, manager, and analyst) makes faster & better decisions

 

Operational and informational Data

Operational Data:

 

Focusing on transactional function such asbank card withdrawals and deposits   Detailed

 

Updateable   Reflects current data

 

Informational Data:

 

Focusing on providing answers to problems posed by decision makers   Summarized

 

Non updateable

 

 

 

Data Warehouse Characteristics

 

It is a database designed for analytical tasks Its content is periodically updated

 

It contains current and historical data to provide a historical perspective of information.

 

Data warehouse Architecture and its seven components

 

Overall Architecture

The data warehouse architecture is based on the data base management system server.

 

The central information repository is surrounded by number of key components

 

Data warehouse is an environment, not a product which is based on relational database management system that functions as the central repository for informational data.

 

The data entered into the data warehouse transformed into an integrated structure and format. The transformation process involves conversion, summarization, filtering and condensation.

 

The data warehouse must be capable of holding and managing large volumes of data as well as different structure of data structures over the time.

 

Key components

1.    Data sourcing, cleanup, transformation, and migration tools

2.    Metadata repository

3.    Warehouse/database technology

4.    Data marts

5.    Data query, reporting, analysis, and mining tools

6.    Data warehouse administration and management

7.    Information delivery system


 

 

1. Data warehouse database

This is the central part of the data warehousing environment.

This is implemented based on RDBMS technology.

 

2 Sourcing, Acquisition, Clean up, and Transformation Tools

They perform conversions, summarization, key changes, structural changes

 

The data transformation is required to use by decision support tools.

 

The transformation produces programs, control statements.

 

It moves the data into data warehouse from multiple operational systems. The functionalities of these tools are listed below:

 

 To remove unwanted data from operational db

 

Converting to common data names and attributes

 

 Calculating summaries and derived data

 

Establishing defaults for missing data

 

 Accommodating source data definition changes

 

3. Meta data

 

It is data about data. It is used for maintaining, managing and using the data warehouse. It is classified into two:

 

Technical Meta data: It contains information about data warehouse data used by warehouse designer, administrator to carry out development and management tasks. It includes,

 

Info about data stores

Transformation descriptions. That si mapping methods from operational db to warehouse db

 

Warehouse Object and data structure definitions for target data   The rules used to perform clean up, and data enhancement   Data mapping operations

 

Access authorization, backup history, archive history, info delivery history, data acquisition history, data access etc.,

 

Business Meta data: It contains info that gives info stored in data warehouse to users. It includes,

 

Subject areas, and info object type including queries, reports, images, video, audio clips etc.   Internet home pages

 

Info related to info delivery system

Data warehouse operational info such as ownerships, audit trails etc.,

 

Meta data helps the users to understand content and find the data. Meta data are stored in a separate data stores which is known as informational directory or Meta data repository which helps to integrate, maintain and view the contents of the data warehouse.

 

The following lists the characteristics of info directory/ Meta data:

It is the gateway to the datawarehouse environment

 

It supports easy distribution and replication of content for high performance and availability   It should be searchable by business oriented key words

 

It should act as a launch platform for end user to access data and analysistools   It should support the sharing of info

 

It should support scheduling options for request

It should support and provide interface to other applications

It should support end user monitoring of the status of the data warehouse environment

 

4 Access tools

 

Its purpose is to provide info to business users for decision making. There are five categories:

Data query and reporting tools   Application development tools   Executive info system tools (EIS)   OLAP tools

 

Data mining tools

 

Query and reporting tools: used to generate query and report. There are two types of reporting tools. They are:

 

Production reporting tool used to generate regular operational reports   Desktop report writer are inexpensive desktop tools designed for end users.

 

Managed Query tools: used to generate SQL query. It uses Meta layer software in between users and databases which offers a point-and-click creation of SQL statement.

 

Application development tools: This is a graphical data access environment which integrates OLAP tools with data warehouse and can be used to access all db systems.

 

OLAP Tools: Are used to analyze the data in multi dimensional and complex views. Data mining tools: are used to discover knowledge from the data warehouse data

 

5 Data marts

It is inexpensive tool and alternative to the data ware house. it based on the subject area

 

. Data mart is used in the following situation:

Extremely urgent user requirement

 

The absence of a budget for a full scale data warehouse strategy   The decentralization of business needs

 

6. Data warehouse admin and management

The management of data warehouse includes,

 

Security and priority management   Monitoring updates from multiple sources   Data quality checks

 

Managing and updating meta data

Auditing and reporting data warehouse usage and status

Purging data

 

Replicating, sub setting and distributing data   Backup and recovery

 

Data warehouse storage management which includes capacity planning, hierarchical storage management and purging of aged data etc.,

 

7 Information delivery system

It is used to enable the process of subscribing for data warehouse info.

 

Delivery to one or more destinations according to specified scheduling algorithm


Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Data Warehousing and Data Mining : Data Warehouse Introduction |


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.