Star Schema query question....

Star Schema query question....

Post by Bob Clayto » Sat, 14 Oct 2000 04:00:00



    I have built a data warehouse that contains four dimension tables:
address, owner, tax_account, taxlot.  The "fact" table (we call "xref") is
supposed to serve the purpose of linking information from the four tables,
so it simply contains the four primary key columns from the four dimension
tables listed above.
    The fact table works great when linking ALL four dimension tables.
However, when joining any two dimension tables via the fact table, we
sometimes get a duplicate result set.
    For example, if I link the tax_account and owner tables to the xref
table, I want to get one row back, but get two rows.  This is what the xref
table looks like:

owner_id    tax_account_id    address_id    taxlot_id
5                45                        2                    688
5                45                        3                    688

    You can see that the owner_id and tax_account_id are repeated, because
there is more than one address associated with them.
    If I build the query to include only the columns of interest and include
the "distinct" keyword, then I get the desired result set.  However, we are
trying to make the data warehouse as simple as possible to query, and would
like users to be able to join any two or three dimension tables using the
xref table without having to use any special keywords like "distinct".
    Any suggestions?  The only options I have thought of so far are:
1.  Build more fact tables, basically one for each possible combination of
the four dimension tables.  The user would then have to decide which fact
table to use.

2.  Build views from the existing fact table.  Again, users would need to
know which view to use.

3.  Make the users build queries that include the "distinct" keyword.

Any other suggestions???
thanks

 
 
 

Star Schema query question....

Post by BP Margoli » Sat, 14 Oct 2000 04:00:00


Bob,

Unless I'm mistaken, it sounds to me like you are re-inventing the wheel. By
that I mean that you are building a data warehouse tool from scratch rather
than using something like SS7.0 OLAP Services or SS2K Analysis Services. If
you use either of these, and build a cube, all of these "problems" are taken
care of automatically.

Am I missing something here?

----------------------------------------------------------------
BP Margolin
Please reply only to the newsgroups.
When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc.) which
can be cut and pasted into Query Analyzer is appreciated.


>     I have built a data warehouse that contains four dimension tables:
> address, owner, tax_account, taxlot.  The "fact" table (we call "xref") is
> supposed to serve the purpose of linking information from the four tables,
> so it simply contains the four primary key columns from the four dimension
> tables listed above.
>     The fact table works great when linking ALL four dimension tables.
> However, when joining any two dimension tables via the fact table, we
> sometimes get a duplicate result set.
>     For example, if I link the tax_account and owner tables to the xref
> table, I want to get one row back, but get two rows.  This is what the
xref
> table looks like:

> owner_id    tax_account_id    address_id    taxlot_id
> 5                45                        2                    688
> 5                45                        3                    688

>     You can see that the owner_id and tax_account_id are repeated, because
> there is more than one address associated with them.
>     If I build the query to include only the columns of interest and
include
> the "distinct" keyword, then I get the desired result set.  However, we
are
> trying to make the data warehouse as simple as possible to query, and
would
> like users to be able to join any two or three dimension tables using the
> xref table without having to use any special keywords like "distinct".
>     Any suggestions?  The only options I have thought of so far are:
> 1.  Build more fact tables, basically one for each possible combination of
> the four dimension tables.  The user would then have to decide which fact
> table to use.

> 2.  Build views from the existing fact table.  Again, users would need to
> know which view to use.

> 3.  Make the users build queries that include the "distinct" keyword.

> Any other suggestions???
> thanks



 
 
 

Star Schema query question....

Post by Ben Craig » Tue, 17 Oct 2000 04:00:00


Unless you're users enjoy determining what type of SQL to use in a query, I
would opt for a solution that Users do not have to worry about "special"
circumstances (i.e. using DISTINCT for some reports but not all).  Views are
cheap, and can be tailored to your users expectations so confusion will be
minimal.

Of course it seams to me that the report you are trying to get with the
OWNER_ID and TAX_ACCOUNT_ID seems to be a report that can be gathered from
your OLTP system and does not require a data warehouse.  However I realize
that several arguments can be made for doing so depending on your
environment.

Good luck.

bl


>     I have built a data warehouse that contains four dimension tables:
> address, owner, tax_account, taxlot.  The "fact" table (we call "xref") is
> supposed to serve the purpose of linking information from the four tables,
> so it simply contains the four primary key columns from the four dimension
> tables listed above.
>     The fact table works great when linking ALL four dimension tables.
> However, when joining any two dimension tables via the fact table, we
> sometimes get a duplicate result set.
>     For example, if I link the tax_account and owner tables to the xref
> table, I want to get one row back, but get two rows.  This is what the
xref
> table looks like:

> owner_id    tax_account_id    address_id    taxlot_id
> 5                45                        2                    688
> 5                45                        3                    688

>     You can see that the owner_id and tax_account_id are repeated, because
> there is more than one address associated with them.
>     If I build the query to include only the columns of interest and
include
> the "distinct" keyword, then I get the desired result set.  However, we
are
> trying to make the data warehouse as simple as possible to query, and
would
> like users to be able to join any two or three dimension tables using the
> xref table without having to use any special keywords like "distinct".
>     Any suggestions?  The only options I have thought of so far are:
> 1.  Build more fact tables, basically one for each possible combination of
> the four dimension tables.  The user would then have to decide which fact
> table to use.

> 2.  Build views from the existing fact table.  Again, users would need to
> know which view to use.

> 3.  Make the users build queries that include the "distinct" keyword.

> Any other suggestions???
> thanks


 
 
 

Star Schema query question....

Post by Bryant Salu » Mon, 23 Oct 2000 04:00:00


Using OLAP Services will result in a MOLAP cube, but it seems they are
looking to build a relational star schema. With MOLAP they would need to buy
a front-end tool that supports MOLAP or write a lot of MDX which is probably
way more advanced than what they are looking to do.

So to answer the original question....
When you query a Star Schema database you should always use all your
dimensions even if you are not restricting the results based on every
dimension. Each data element is described by all of the dimensions so it is
proper to join with all of the dimension tables for your query. Of course I
always prefer to use a front-end tool like Brio or Cognos. But then again I
also prefer MOLAP anyway.

Bryant Salus
Data Advantage Group
http://www.dataag.com


> Bob,

> Unless I'm mistaken, it sounds to me like you are re-inventing the wheel.
By
> that I mean that you are building a data warehouse tool from scratch
rather
> than using something like SS7.0 OLAP Services or SS2K Analysis Services.
If
> you use either of these, and build a cube, all of these "problems" are
taken
> care of automatically.

> Am I missing something here?

> ----------------------------------------------------------------
> BP Margolin
> Please reply only to the newsgroups.
> When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc.) which
> can be cut and pasted into Query Analyzer is appreciated.



> >     I have built a data warehouse that contains four dimension tables:
> > address, owner, tax_account, taxlot.  The "fact" table (we call "xref")
is
> > supposed to serve the purpose of linking information from the four
tables,
> > so it simply contains the four primary key columns from the four
dimension
> > tables listed above.
> >     The fact table works great when linking ALL four dimension tables.
> > However, when joining any two dimension tables via the fact table, we
> > sometimes get a duplicate result set.
> >     For example, if I link the tax_account and owner tables to the xref
> > table, I want to get one row back, but get two rows.  This is what the
> xref
> > table looks like:

> > owner_id    tax_account_id    address_id    taxlot_id
> > 5                45                        2                    688
> > 5                45                        3                    688

> >     You can see that the owner_id and tax_account_id are repeated,
because
> > there is more than one address associated with them.
> >     If I build the query to include only the columns of interest and
> include
> > the "distinct" keyword, then I get the desired result set.  However, we
> are
> > trying to make the data warehouse as simple as possible to query, and
> would
> > like users to be able to join any two or three dimension tables using
the
> > xref table without having to use any special keywords like "distinct".
> >     Any suggestions?  The only options I have thought of so far are:
> > 1.  Build more fact tables, basically one for each possible combination
of
> > the four dimension tables.  The user would then have to decide which
fact
> > table to use.

> > 2.  Build views from the existing fact table.  Again, users would need
to
> > know which view to use.

> > 3.  Make the users build queries that include the "distinct" keyword.

> > Any other suggestions???
> > thanks


 
 
 

Star Schema query question....

Post by BP Margoli » Mon, 23 Oct 2000 04:00:00


Bryant,

Please correct me if I wrong, but it is my understanding that OLAP Services
also supports ROLAP ... I would not necessarily disagree that to go the OLAP
Services route they might need a front-end tool that is OLAP Services aware
... however, it seems that that just goes back to my original point about
re-inventing the wheel.

I seriously doubt that the cost of development and maintenance of a wholly
in-house solution will favorably compare with using a pre-built solution.

----------------------------------------------------------------
BP Margolin
Please reply only to the newsgroups.
When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc.) which
can be cut and pasted into Query Analyzer is appreciated.


> Using OLAP Services will result in a MOLAP cube, but it seems they are
> looking to build a relational star schema. With MOLAP they would need to
buy
> a front-end tool that supports MOLAP or write a lot of MDX which is
probably
> way more advanced than what they are looking to do.

> So to answer the original question....
> When you query a Star Schema database you should always use all your
> dimensions even if you are not restricting the results based on every
> dimension. Each data element is described by all of the dimensions so it
is
> proper to join with all of the dimension tables for your query. Of course
I
> always prefer to use a front-end tool like Brio or Cognos. But then again
I
> also prefer MOLAP anyway.

> Bryant Salus
> Data Advantage Group
> http://www.dataag.com



> > Bob,

> > Unless I'm mistaken, it sounds to me like you are re-inventing the
wheel.
> By
> > that I mean that you are building a data warehouse tool from scratch
> rather
> > than using something like SS7.0 OLAP Services or SS2K Analysis Services.
> If
> > you use either of these, and build a cube, all of these "problems" are
> taken
> > care of automatically.

> > Am I missing something here?

> > ----------------------------------------------------------------
> > BP Margolin
> > Please reply only to the newsgroups.
> > When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc.)
which
> > can be cut and pasted into Query Analyzer is appreciated.



> > >     I have built a data warehouse that contains four dimension tables:
> > > address, owner, tax_account, taxlot.  The "fact" table (we call
"xref")
> is
> > > supposed to serve the purpose of linking information from the four
> tables,
> > > so it simply contains the four primary key columns from the four
> dimension
> > > tables listed above.
> > >     The fact table works great when linking ALL four dimension tables.
> > > However, when joining any two dimension tables via the fact table, we
> > > sometimes get a duplicate result set.
> > >     For example, if I link the tax_account and owner tables to the
xref
> > > table, I want to get one row back, but get two rows.  This is what the
> > xref
> > > table looks like:

> > > owner_id    tax_account_id    address_id    taxlot_id
> > > 5                45                        2                    688
> > > 5                45                        3                    688

> > >     You can see that the owner_id and tax_account_id are repeated,
> because
> > > there is more than one address associated with them.
> > >     If I build the query to include only the columns of interest and
> > include
> > > the "distinct" keyword, then I get the desired result set.  However,
we
> > are
> > > trying to make the data warehouse as simple as possible to query, and
> > would
> > > like users to be able to join any two or three dimension tables using
> the
> > > xref table without having to use any special keywords like "distinct".
> > >     Any suggestions?  The only options I have thought of so far are:
> > > 1.  Build more fact tables, basically one for each possible
combination
> of
> > > the four dimension tables.  The user would then have to decide which
> fact
> > > table to use.

> > > 2.  Build views from the existing fact table.  Again, users would need
> to
> > > know which view to use.

> > > 3.  Make the users build queries that include the "distinct" keyword.

> > > Any other suggestions???
> > > thanks


 
 
 

Star Schema query question....

Post by Mike » Wed, 25 Oct 2000 04:00:00


Bob -

Read Kimball's book. You need to use the distinct clause in your SQL.

Mike


>     I have built a data warehouse that contains four dimension tables:
> address, owner, tax_account, taxlot.  The "fact" table (we call "xref") is
> supposed to serve the purpose of linking information from the four tables,
> so it simply contains the four primary key columns from the four dimension
> tables listed above.
>     The fact table works great when linking ALL four dimension tables.
> However, when joining any two dimension tables via the fact table, we
> sometimes get a duplicate result set.
>     For example, if I link the tax_account and owner tables to the xref
> table, I want to get one row back, but get two rows.  This is what the xref
> table looks like:

> owner_id    tax_account_id    address_id    taxlot_id
> 5                45                        2                    688
> 5                45                        3                    688

>     You can see that the owner_id and tax_account_id are repeated, because
> there is more than one address associated with them.
>     If I build the query to include only the columns of interest and include
> the "distinct" keyword, then I get the desired result set.  However, we are
> trying to make the data warehouse as simple as possible to query, and would
> like users to be able to join any two or three dimension tables using the
> xref table without having to use any special keywords like "distinct".
>     Any suggestions?  The only options I have thought of so far are:
> 1.  Build more fact tables, basically one for each possible combination of
> the four dimension tables.  The user would then have to decide which fact
> table to use.

> 2.  Build views from the existing fact table.  Again, users would need to
> know which view to use.

> 3.  Make the users build queries that include the "distinct" keyword.

> Any other suggestions???
> thanks


 
 
 

Star Schema query question....

Post by Bryant Salu » Fri, 03 Nov 2000 15:26:01


You can do ROLAP with OLAP Server, but they are looking to build a
relational star schema not ROLAP, MOLAP, or HOLAP. I agree, I would always
try to find a tool that works over building it. And after re-reading the
original post I strongly recommend against doing anything that involves
users writing SQL against your database. Perhaps a MOLAP cube w/ Excel2K
would do the job required here, and it would be fairly easy to implement.


> Bryant,

> Please correct me if I wrong, but it is my understanding that OLAP
Services
> also supports ROLAP ... I would not necessarily disagree that to go the
OLAP
> Services route they might need a front-end tool that is OLAP Services
aware
> ... however, it seems that that just goes back to my original point about
> re-inventing the wheel.

> I seriously doubt that the cost of development and maintenance of a wholly
> in-house solution will favorably compare with using a pre-built solution.

> ----------------------------------------------------------------
> BP Margolin
> Please reply only to the newsgroups.
> When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc.) which
> can be cut and pasted into Query Analyzer is appreciated.



> > Using OLAP Services will result in a MOLAP cube, but it seems they are
> > looking to build a relational star schema. With MOLAP they would need to
> buy
> > a front-end tool that supports MOLAP or write a lot of MDX which is
> probably
> > way more advanced than what they are looking to do.

> > So to answer the original question....
> > When you query a Star Schema database you should always use all your
> > dimensions even if you are not restricting the results based on every
> > dimension. Each data element is described by all of the dimensions so it
> is
> > proper to join with all of the dimension tables for your query. Of
course
> I
> > always prefer to use a front-end tool like Brio or Cognos. But then
again
> I
> > also prefer MOLAP anyway.

> > Bryant Salus
> > Data Advantage Group
> > http://www.dataag.com



> > > Bob,

> > > Unless I'm mistaken, it sounds to me like you are re-inventing the
> wheel.
> > By
> > > that I mean that you are building a data warehouse tool from scratch
> > rather
> > > than using something like SS7.0 OLAP Services or SS2K Analysis
Services.
> > If
> > > you use either of these, and build a cube, all of these "problems" are
> > taken
> > > care of automatically.

> > > Am I missing something here?

> > > ----------------------------------------------------------------
> > > BP Margolin
> > > Please reply only to the newsgroups.
> > > When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc.)
> which
> > > can be cut and pasted into Query Analyzer is appreciated.



> > > >     I have built a data warehouse that contains four dimension
tables:
> > > > address, owner, tax_account, taxlot.  The "fact" table (we call
> "xref")
> > is
> > > > supposed to serve the purpose of linking information from the four
> > tables,
> > > > so it simply contains the four primary key columns from the four
> > dimension
> > > > tables listed above.
> > > >     The fact table works great when linking ALL four dimension
tables.
> > > > However, when joining any two dimension tables via the fact table,
we
> > > > sometimes get a duplicate result set.
> > > >     For example, if I link the tax_account and owner tables to the
> xref
> > > > table, I want to get one row back, but get two rows.  This is what
the
> > > xref
> > > > table looks like:

> > > > owner_id    tax_account_id    address_id    taxlot_id
> > > > 5                45                        2                    688
> > > > 5                45                        3                    688

> > > >     You can see that the owner_id and tax_account_id are repeated,
> > because
> > > > there is more than one address associated with them.
> > > >     If I build the query to include only the columns of interest and
> > > include
> > > > the "distinct" keyword, then I get the desired result set.  However,
> we
> > > are
> > > > trying to make the data warehouse as simple as possible to query,
and
> > > would
> > > > like users to be able to join any two or three dimension tables
using
> > the
> > > > xref table without having to use any special keywords like
"distinct".
> > > >     Any suggestions?  The only options I have thought of so far are:
> > > > 1.  Build more fact tables, basically one for each possible
> combination
> > of
> > > > the four dimension tables.  The user would then have to decide which
> > fact
> > > > table to use.

> > > > 2.  Build views from the existing fact table.  Again, users would
need
> > to
> > > > know which view to use.

> > > > 3.  Make the users build queries that include the "distinct"
keyword.

> > > > Any other suggestions???
> > > > thanks


 
 
 

1. Star Schema Query Performance?

Hello, all.

I had a meeting with my user community yesterday that gave me some
concerns on query building and performance.  

I'm building star schema data marts for an R&D environment that will be
the repository of scientific data.  

Users will generate a list of about 1000 identifiers that aren't the
lowest grain of the dimension table.  I'll have an index on the field,
but still...

They will be using this list as a constraint on a query joining to a
fact table of about 20 million records.  From 1000 input values to
3-5000 dimension entries (total size of dimension table between 1.5
million rows), joining on a 20 M row fact table.  (All sizes subject to
doubling with an upcoming merger then 25% growth per year.)

How well will it perform, given that they'll do this in an ad-hoc query
mode?  What should I watch out for?  Any design advice (Oracle 8i tips
included) will be greatly appreciated.

Thanks in advance for any input.
Paul

2. Brand new to Progress

3. Some Star Schema Queries

4. A report call from a form

5. Star Schema Query?

6. Slow Updates

7. Star schema/Query

8. Test for object var or with block var?

9. SQL Query to Select data from Star Schema

10. Star Schema & Snow Flake Schema

11. Star Schema SP question

12. Questions on Star Schema Aggregate Tables

13. Question about STAR-schema-design