Data Warehouse:
Ø Large organizations have complex
internal organizations, and have data stored at different locations, on
different operational (transaction processing) systems, under different schemas
Ø Data sources often store only current
data, not historical data
Ø Corporate decision making requires a
unified view of all organizational data, including historical data
Ø A data warehouse is a repository
(archive) of information gathered from multiple sources, stored under a unified
schema, at a single site
· Greatly simplifies querying, permits study of
historical trends
· Shifts decision support query load away from
transaction processing systems
When and how to gather data
· Source driven architecture: data sources transmit
new information to warehouse, either continuously or periodically (e.g. at
night)
· Destination driven architecture: warehouse
periodically requests new information from data sources
· Keeping warehouse exactly synchronized with data
sources (e.g. using two-phase commit) is too expensive
· Usually OK to have slightly out-of-date data at
warehouse
· Data/updates are
periodically downloaded form online transaction processing (OLTP)
systems.
What schema to use
· Schema integration
Data cleansing
· E.g. correct mistakes in addresses
· E.g. misspellings, zip code errors
Merge
address lists from different sources and purge duplicates
· Keep only one address record per household
(―householding‖)
How to propagate updates
· Warehouse schema may be a (materialized) view of
schema from data sources
· Efficient techniques for update of materialized
views
What data to summarize
· Raw data may be too large to store on-line
· Aggregate values (totals/subtotals) often suffice
· Queries on raw data can often be transformed by
query optimizer to use aggregate values.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.