Chapter: Database Management Systems - Trends In Database Technology

| Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail |

Data Warehouse

Large organizations have complex internal organizations, and have data stored at different locations, on different operational (transaction processing) systems, under different schemas

Data Warehouse:

 

Ø Large organizations have complex internal organizations, and have data stored at different locations, on different operational (transaction processing) systems, under different schemas

Ø Data sources often store only current data, not historical data

 

Ø Corporate decision making requires a unified view of all organizational data, including historical data

 

Ø A data warehouse is a repository (archive) of information gathered from multiple sources, stored under a unified schema, at a single site

 

· Greatly simplifies querying, permits study of historical trends

 

· Shifts decision support query load away from transaction processing systems

 

When and how to gather data

 

· Source driven architecture: data sources transmit new information to warehouse, either continuously or periodically (e.g. at night)

 

· Destination driven architecture: warehouse periodically requests new information from data sources

 

· Keeping warehouse exactly synchronized with data sources (e.g. using two-phase commit) is too expensive

 

· Usually OK to have slightly out-of-date data at warehouse

· Data/updates are periodically downloaded form online transaction processing (OLTP)



systems. What schema to use


· Schema integration


Data cleansing

· E.g. correct mistakes in addresses

· E.g. misspellings, zip code errors

Merge address lists from different sources and purge duplicates

· Keep only one address record per household (―householding)


How to propagate updates

· Warehouse schema may be a (materialized) view of schema from data sources

· Efficient techniques for update of materialized views


What data to summarize

· Raw data may be too large to store on-line

· Aggregate values (totals/subtotals) often suffice

· Queries on raw data can often be transformed by query optimizer to use aggregate values.


Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail


Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.