single select w/join versus multiple selects

single select w/join versus multiple selects

Post by Robert Youn » Wed, 28 Aug 2002 07:40:41



The Setting:  common example, Order table and Order_Line table

Scenario 1) :  select statement with join

Scenario 2) :  select from Order table, then select from Order_Line table

The Argument:  2 is an order of magnitude faster than 1.

saw this on a language thread, not DB2.  db in use was not mentioned.
seems to me that 2 is guaranteed to fail (not always, of course) because
the Order_Line table is free to be modified while the Order table is
being read.

So:  does either the SQL standard or DB2 guarantee that both Scenarios
will return the same data??  i think not, but can't find documentation
to prove it.

toodles,
robert

 
 
 

single select w/join versus multiple selects

Post by temporary.. » Wed, 28 Aug 2002 09:29:37



> The Setting:  common example, Order table and Order_Line table

> Scenario 1) :  select statement with join

> Scenario 2) :  select from Order table, then select from Order_Line table

> The Argument:  2 is an order of magnitude faster than 1.

> saw this on a language thread, not DB2.  db in use was not mentioned.
> seems to me that 2 is guaranteed to fail (not always, of course) because
> the Order_Line table is free to be modified while the Order table is
> being read.

> So:  does either the SQL standard or DB2 guarantee that both Scenarios
> will return the same data??  i think not, but can't find documentation
> to prove it.

It's sql that allows semantically rewriting of queries (not db2) ... its not
clear what the sql your referred to above looked like, nor the access plans
chosen either. However, from what you said, if scenario (2) is
de-correlatable (which is apparently was case),  both scenarios should most
likely
come-up with the same access plan. Its not clear what dbms this was; if db2,
would like to see the  access plans. But, if (1) came up with a bad plan
... the system in use may have had bad stats and picked the wrong join order
(a.k.a., picked the wrong table as outer, and or the join method), and
scenario (2) may have picked the correct access plan by chance.

- Show quoted text -

Quote:

> toodles,
> robert


 
 
 

single select w/join versus multiple selects

Post by Alexander Kuznets » Wed, 28 Aug 2002 12:58:53


Hi Robert,
Quote:> The Setting:  common example, Order table and Order_Line table

> Scenario 1) :  select statement with join

> Scenario 2) :  select from Order table, then select from Order_Line table

> The Argument:  2 is an order of magnitude faster than 1.

not necessarily. Merge joins, for example, may be faster

Quote:> So:  does either the SQL standard or DB2 guarantee that both Scenarios
> will return the same data??  i think not, but can't find documentation
> to prove it.

Given such requirements, you could consider explicitly locking the tables

Good luck!
Alexander

 
 
 

single select w/join versus multiple selects

Post by robe » Wed, 28 Aug 2002 21:10:36




> > The Setting:  common example, Order table and Order_Line table

> > Scenario 1) :  select statement with join

> > Scenario 2) :  select from Order table, then select from Order_Line table

> > The Argument:  2 is an order of magnitude faster than 1.

> > saw this on a language thread, not DB2.  db in use was not mentioned.
> > seems to me that 2 is guaranteed to fail (not always, of course) because
> > the Order_Line table is free to be modified while the Order table is
> > being read.

> > So:  does either the SQL standard or DB2 guarantee that both Scenarios
> > will return the same data??  i think not, but can't find documentation
> > to prove it.

create table order
(order_number int not null primary key,
 data char(256))

create table order_line
(order_number int not null references order(order_number),
 line_number int not null,
 data char(256))

create unique index order_line_index on order_line (order_number,
                                                    line_number)

1) select * from order o
   join order_line ol on o.order_number = ol.order_number
   where .......

2) select * from order where data = 'smith'

   select * from order_line o where exists (select * from  order o
                                             where o.data = 'smith' and
                                                   o.order = ol.order)
  or (more likely from the COBOLers)

   select * from order_line where order_number = :StoredOrderNumber

so:  the question is not whether this, or similar, syntax will be
faster (or slower); rather whether 1) and 2) will return the same
data.  if order.data were the order total (or any data derived from
order_line), then i think not.  i don't know any syntax which will
"pre-lock" order_line prior to its being referenced in its select
statement.  it's during this time that inconsistency will appear.

Quote:> It's sql that allows semantically rewriting of queries (not db2) ...

<snip>

the coder chooses to write it one way or the other:  will a database
always return the same data??

toodles,
robert

 
 
 

single select w/join versus multiple selects

Post by Alexander Kuznets » Thu, 29 Aug 2002 04:07:42


Quote:> create table order
> (order_number int not null primary key,
>  data char(256))

> create table order_line
> (order_number int not null references order(order_number),
>  line_number int not null,
>  data char(256))

> create unique index order_line_index on order_line (order_number,
>                                                     line_number)

> 1) select * from order o
>    join order_line ol on o.order_number = ol.order_number
>    where .......

> 2) select * from order where data = 'smith'

>    select * from order_line o where exists (select * from  order o
>                                              where o.data = 'smith' and
>                                                    o.order = ol.order)
>   or (more likely from the COBOLers)

>    select * from order_line where order_number = :StoredOrderNumber

> so:  the question is not whether this, or similar, syntax will be
> faster (or slower); rather whether 1) and 2) will return the same
> data.  if order.data were the order total (or any data derived from
> order_line), then i think not.  i don't know any syntax which will
> "pre-lock" order_line prior to its being referenced in its select
> statement.  it's during this time that inconsistency will appear.

try this (with AUTOCOMMIT off):
1. CREATE TABLE ORDER_LOCK(ORDER_NUMBER INTEGER NOT NULL PRIMARY KEY,
TAG SMALLINT DEFAULT 0);
2. have it populated (INSERT INTO ORDER_LOCK(ORDER_NUMBER)SELECT
ORDER_NUMBER FROM ORDER ) and keep it populated with insert/delete
triggers on order
3. before selecting from order issue UPDATE ORDER_LOCK SET TAG = 1 -
TAG
WHERE ORDER_NUMBER IN(select ORDER_NUMBER  from  order o where o.data
= 'smith' )
4. Add to insert/update/delete triggers on order_line: UPDATE
ORDER_LOCK SET TAG = 1 - TAG WHERE ORDER_NUMBER = NEW_TAB.ORDER_NUMBER
(old_tab FOR DELETE TRIGGER)
that will lock relevant child records
5. don't forget to commit ASAP
there are many other custom solutions, keep trying
good luck
Alexander
 
 
 

single select w/join versus multiple selects

Post by Alexander Kuznets » Thu, 29 Aug 2002 04:19:01


oops,
addition to my previous post (hit Post too early)
run the*select with RR
Quote:> toodles,

what does it mean "toodles"?
 
 
 

single select w/join versus multiple selects

Post by Obnoxio The Clow » Thu, 29 Aug 2002 04:36:15



> oops,
> addition to my previous post (hit Post too early)
> run the*select with RR

>>toodles,

> what does it mean "toodles"?

Goodbye. Short for "toodle-oo", which means, er, "goodbye".
 
 
 

single select w/join versus multiple selects

Post by temporary.. » Thu, 29 Aug 2002 23:23:41





> > > The Setting:  common example, Order table and Order_Line table

> > > Scenario 1) :  select statement with join

> > > Scenario 2) :  select from Order table, then select from Order_Line table

> > > The Argument:  2 is an order of magnitude faster than 1.

> > > saw this on a language thread, not DB2.  db in use was not mentioned.
> > > seems to me that 2 is guaranteed to fail (not always, of course) because
> > > the Order_Line table is free to be modified while the Order table is
> > > being read.

> > > So:  does either the SQL standard or DB2 guarantee that both Scenarios
> > > will return the same data??  i think not, but can't find documentation
> > > to prove it.

> create table order
> (order_number int not null primary key,
>  data char(256))

> create table order_line
> (order_number int not null references order(order_number),
>  line_number int not null,
>  data char(256))

> create unique index order_line_index on order_line (order_number,
>                                                     line_number)

> 1) select * from order o
>    join order_line ol on o.order_number = ol.order_number
>    where .......

> 2) select * from order where data = 'smith'

>    select * from order_line o where exists (select * from  order o
>                                              where o.data = 'smith' and
>                                                    o.order = ol.order)
>   or (more likely from the COBOLers)

>    select * from order_line where order_number = :StoredOrderNumber

> so:  the question is not whether this, or similar, syntax will be
> faster (or slower); rather whether 1) and 2) will return the same
> data.  if order.data were the order total (or any data derived from
> order_line), then i think not.  i don't know any syntax which will
> "pre-lock" order_line prior to its being referenced in its select
> statement.  it's during this time that inconsistency will appear.

> > It's sql that allows semantically rewriting of queries (not db2) ...
> <snip>

> the coder chooses to write it one way or the other:  will a database
> always return the same data??

Hmmm, not sure where you are trying to go with the above -- you need to be careful,
your examples are not semantically equivalent; there is absolutely no guarantee that
they return the same result. Lets forget for a moment that the data columns returned
are not all the same (and the ddl and dml syntax errors), and focus on just the
resultant rows for a moment. Your first three select stmts above ... the one listed
under bullet (1), as well as the first two stmts under (2), will in all likelihood
return a different number of rows -- by definition of the 1:N relationship between
the two tables. Your test database system may return the same data just by chance --
but, that does not mean they are semantically the same.

Here is an example of two semantically equivalent queries ...

    1) SELECT  ol.*  FROM  order_line ol JOIN order o ON ol.order_number =
o.order_number
    2) SELECT  ol.*  FROM  order_line ol

These are in fact semantically identical (by definition of RI further above) and
will always return the same data! And, DB2 will re-write the first into the second!

There are many many more examples. Case in point, here is just one more  example of
three equivalent queries (wrt to their result set)  ...
    1) SELECT  o.order_number,  o.data FROM order o LEFT OUTER JOIN order_line ol ON
o.order_number = ol.order_number
        WHERE   o.order_number IS NOT NULL AND ol.order_number IS NOT NULL
    2) SELECT  o.order_number, o.data FROM order o JOIN order_line ol ON
o.order_number = ol.order_number
    3) SELECT  o.order_number, o.data FROM order o WHERE EXISTS ( SELECT 1 FROM
order_line ol WHERE
        ol.order_number = o.order_number)

The examples you gave are not.  As a side note, always be careful about NULLs, even
though you may see what might look like semantically equivalent queries (and, even
return the same data on your test system),  ... they may in fact not be semantically
equivalent. As I stated originally, this is SQL itself implying that two differently
written statements will be _guaranteed_ to return the same results. Its them up to
the database system to identify this, and cost each out. Then of course, pick the
cheapest plan. Obviously, some databases are better than others here.

So ...

To answer your question. If the user decides to write it one way versus another --
as long as they are semantically the same -- then yes, the system will return the
same results.

- Show quoted text -

Quote:

> toodles,
> robert

 
 
 

1. Multiple selects versus join

Is it more efficient to do multiple selects on multiple tables to get
data or do a join of those tables and extract the data from the
resulting temp table?

For example:

select name from a where id=1;
select pub_date from b where id=1;
pub_id = select publisher_id from c where id=1;
select pub_name from d where id = pub_id;

or (I don't know the syntax for join so this is just pseudo-sql);

select name, pub_date, pub_name from join((join(A, B, C) on id), D) on
pub_id where id = 1;

All my tables are related to each other so I am wondering which is more
efficient, do multile selects or joining the tables. (Joining the table
would return just one row).

Thanks for the advice and sorry if the pseudo-sql is hard to understand.

Jc

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command

2. disconnected recordset: how to update?

3. Multiple SPs or single SP with multiple selects?

4. ADO & MDAC_Typ.exe

5. Join versus nested select?

6. What is sybsys10.sql?

7. Performance question: Varchar select versus Join

8. Date Issues

9. CURSORS : SELECT * VERSUS SELECT only necessary columns

10. Views versus selects from multiple tables

11. Deadlock involving single select with a join

12. Select multiple top 5 record from single record set