Datamation: Don't warehouse dirty data

Datamation: Don't warehouse dirty data

Post by Michael E Wille » Fri, 20 Oct 1995 04:00:00



The October 15 issue of Datamation has a lengthy article with the
above title about data warehousing.  It starts off "The old
programmers' slogan "Garbage in, garbage out" is true with
a vengeance in a data warehouse."

Mike W.

Michael Willett, EE

Storage Computer Corp.
http://www.storage.com/
11 Riverside Street
Nashua, NH 03062
Tel. 603-880-3005

:::I/O-accelerated, very fast disk arrays for all SCSI systems to 1,000 GB:::

 
 
 

Datamation: Don't warehouse dirty data

Post by Benjamin Ta » Thu, 26 Oct 1995 04:00:00



>The October 15 issue of Datamation has a lengthy article with the
>above title about data warehousing.  It starts off "The old
>programmers' slogan "Garbage in, garbage out" is true with
>a vengeance in a data warehouse."

>Mike W.

>Michael Willett, EE

>Storage Computer Corp.
>http://www.storage.com/
>11 Riverside Street
>Nashua, NH 03062
>Tel. 603-880-3005

>:::I/O-accelerated, very fast disk arrays for all SCSI systems to 1,000
GB:::

I went to see  Pyramid sales presentation today at which Ralph Kimball
spoke.  He raised the interesting point that building a data warehouse can
help show the problems in the related OLTP data.  When a manager sees a zip
code dimension and notices that a good deal of his customers are coded as
'zip code not available' he quickly gets the picture that perhaps a problem
exists with the source data.

The warehouse cannot create correct data, it can only reflect the
correctness (or lack thereof) of the underlying source data.
-------------------------------------------
              Benjamin Taub

          DATASPACE INCORPORATED
            ph:  (313) 761-5962
            fax: (313) 761-5967
       DSS & Data Warehouse design,
development, project management, & training

 
 
 

Datamation: Don't warehouse dirty data

Post by Alastair Newso » Sat, 28 Oct 1995 04:00:00


Quote:> I am very happy you brought this up. I am dazed by the notion that you
> can point and click legacy data to warehousing environments with no
> accountability or cleanup.

I fully endorse this opinion,  having spent 4 years on a sales/marketing
and seen the problems of data-take on from legacy systems it was
horrifying the effort needed to clean up the data. It was scary the way
the 'dirt' kept coming out of the 'clean' data.  Some key decisions were
and continue to be made using this data.

When its supporting major decision making it can be
GARBAGE-IN-REDUNDANCIES-OUT as the organisation "right-sizes"

regards

Alastair


 
 
 

Datamation: Don't warehouse dirty data

Post by Doug10SN » Tue, 07 Nov 1995 04:00:00


Four quick points related to this discussion:

1. Management and end user expectations need to be set that the data
warehouse data will "reflect" the cleanliness of the ODS data.  This is a
key to a successful implementation.

2. Just as the first step in cleaning up one's * or drug problem is
to recognize and admit you have a problem -- getting on the road to
cleaner operational data often requires more widespread corporate access
to it.

3. Data warehouse tools such as Prism Warehouse Manager can be used to
clean data to some degree.  Some techniques: data integration (multiple
sourcing), if-then-else business rules, externally linked routines,
field-level parsing.  (Yes, DW purists may cringe a little.)

4. Metadata captured in the data warehouse can give business users an
indication of data cleanliness and completeness.  This metadata can be
displayed along with canned or ad-hoc OLAP queries.  

Cheers,
Doug Laney, Consulting Manager
Prism Solutions

 
 
 

1. Feb 1 Datamation Data Warehouse issue

The Feb. 1 issue of Datamation has quite a lot of material on data
warehousing.  (There's a big data warehousing convention in Orlando in a
couple of weeks; we can't exhibit since exhibitor space is sold out and
there is still a long waiting line.)

Mike W.

Michael Willett, EE
Storage Computer Corp.
11 Riverside Street
Nashua, NH 03062
Tel. 603-880-3005

:::I/O-accelerated, very fast disk arrays for all SCSI systems to 1,000 GB:::
:::::::::::::::::::::::::::We make SCSI systems scream.::::::::::::::::::::::

2. is it possible ?

3. Paradox 3.5 date conversion

4. triggers don't get exported, dba jobs don't run

5. Is It a Bug?(Correction of Previous Question)

6. Don't duplicate v Don't resend trade-off theory

7. Why data report don't refresh the data

8. MD-Annapolis Junction-240349--Data Warehousing-ORACLE-SQL-PL/SQL-Data Modeling-Data Warehouse Analysts

9. Datamation's Proposed Extention to SQL Language (#1)