Warehouse is the extension of database
• Data warehouse
is the main
repository of c
in the data warehouse are processed (i.e., EFL) therefore is more integrated
and consistent .
the information in the database tends to be real-time, the information in the
data warehouse can be updated regularly.
database focuses on automating the process of collecting and customers
information, data warehouse looks more at assisting managers in performing more
advanced analysis and thus making better decisions.
What is Data warehousing?
A series of analytical
tools works with data stored in databases to find patterns and insights for
helping managers and employees make better decisions to improve organizational
often build enterprise-wide data warehouses, where a central data warehouse
serves the entire organization, or they create smaller, decentralized
warehouses called data marts.
data mart is a subset of a data warehouse in which a summarized
or highly focused portion of the customers data is placed in a separate
database for a specific population of users.
example, a company might develop marketing and sales data marts to deal with
data mart typically focuses on a single subject area or line of
business, so it usually can be constructed more rapidly and at lower cost than
an enterprise-wide data warehouse.
complexity, costs, and management problems will rise if an organization creates
too many data marts.
Acquisition, Cleanup and Transformation Tools
– Removing unwanted data from operational
– Converting to common data names and
– Calculating summaries and derived data
– Establishing defaults for missing data
– Accommodating source data definition
data sourcing, cleanup, extract, transformation and migration tools have to
deal with some significant issues, as follows:
– Database heterogeneity.
– Data heterogeneity
is data about data that describes the data warehouse.
is used for building, maintaining, managing, and using the data warehouse.
can be classified into the following:
– Technical metadata
– Business metadata
– Data warehouse
operational information such as data history (snapshots, versions),
ownership, extract audit trail, usage data
non-trivial extraction of novel, implicit, and actionable knowledge from large
– Extremely large datasets
– Discovery of the non-obvious
– Useful knowledge that can improve
– Cannot be done manually
to enable data exploration, data analysis, and data visualization of very large
databases at a high level of abstraction, without a specific hypothesis in
data search capability that uses statistical algorithms to discover patterns
and correlations in data.
Mining is a step of Knowledge Discovery in Databases (KDD) Process
– Data Warehousing
– Data Selection
– Data Preprocessing
– Data Transformation
– Data Mining
Mining is sometimes referred to as KDD and DM and KDD tend to be used as
• Data Mining
is Not …
/ Ad Hoc Queries / Reporting
Analytical Processing (OLAP)
– The stages in the relationship between a
customer and a business
stages in the customer lifecycle
– Prospects: people
who are not yet customers but are in the target market
– Responders: prospects
who show an interest in a product or service
– Active Customers: people
who are currently using the product or service
– Former Customers: may
be ―bad‖ customers who did incurred high costs
• It‘s important to know life cycle events (e.g.
marketers want: Increasing customer revenue and customer profitability
– Keeping the customers for a longer
period of time
Applying data mining
– Determine the behavior surrounding a
particular lifecycle event
– Find other people in
similar life stages and determine which customers are following
similar behavior patterns