"OR" performance improvement

"OR" performance improvement

Post by vittha » Fri, 12 Apr 2002 07:38:01



I have a table with close to 5 million records. Have a
query with lot of "OR" operators. The format of query is
like

where (type = a and value = b)
or   (type = c and value = d)
or   (type = e and value = f)
or ...

There is index on "value" and "type" fields. Is there any
way I can tune this query to perform well.

thanks a lot
vitthal

 
 
 

"OR" performance improvement

Post by Greg Linwoo » Fri, 12 Apr 2002 08:26:29


The best example I've seen on how to describe this is the phone book
example.

Imagine your table is the phone book

create table phonebook (
 lastname varchar (255),
 firstname varchar (255),
 constraint pk_phonebook primary key (lastname, firstname)
)

select * from phonebook where lastname = 'smith'
this query will use the primary key index.

select * from phonebook where lastname = 'smith' and firstname = 'john'
this query is a subset of the first, and will also use the primary key index

select * from phonebook where lastname = 'smith' OR firstname = 'john'

think about this last query - how can it possibly use the available
index(PK)? The or introduces the need for an entire table scan because the
index (PK) is only sorted by the lastname column.

what you really need in this example is another index on firstname so that
the query optimizer can evaluate the sarg on firstname to a useful index..

so - conceptually, you'll likely need more indexes, but I've given a
simplified answer and you'll need to know much more about how SQL Server
impliments physical indexes before you'll be effective in dealing with
situations like this.

I suggest you grab Kalen Delaney's "Inside SQL Server 2000" and look up
Chapter 15. Page 822 is particularly useful to you..

HTH

Cheers,
Greg Linwood


Quote:> I have a table with close to 5 million records. Have a
> query with lot of "OR" operators. The format of query is
> like

> where (type = a and value = b)
> or   (type = c and value = d)
> or   (type = e and value = f)
> or ...

> There is index on "value" and "type" fields. Is there any
> way I can tune this query to perform well.

> thanks a lot
> vitthal


 
 
 

"OR" performance improvement

Post by Carl Feder » Fri, 12 Apr 2002 10:01:28


If both columns to be compared are always the same
columns, then try:

select *
from BigTable
join (
select 'a' as type, b as value union all
select 'c' as type, d as value union all
select 'e' as type, f as value
) as OrConds
on BigTable.type = OrConds.type
and BigTable.value = OrConds.value

 
 
 

"OR" performance improvement

Post by Jens K Jense » Sun, 14 Apr 2002 19:47:02


A third option is:

select col1, col2, col3 from table1 where type = a and value = b
union all
select col1, col2, col3 from table1 where type = c and value = d
union all
select col1, col2, col3 from table1 where type = e and value = f
union all

etc.

This should result in multiple index seeks (given the cardinality of
type+value is good).

In below examples querying a (fake) 100.000 ZipCodes table costs were:

select * from zipcodes
where
(dataareaid = 'bmd' and zipcode = '    000001')
or
(dataareaid = 'bmd' and zipcode = '    000010')
or
(dataareaid = 'bmd' and zipcode = '    000021')
or
(dataareaid = 'bmd' and zipcode = '    000031')
or
(dataareaid = 'bmd' and zipcode = '    000041')

Estimated cost: 6
True logical reads: 10 if cluster index, 15 if non clustered
True duration: 1 ms

select *
from zipcodes
join (
select 'bmd' as dataareaid, '    000001' as zipcode union all
select 'bmd' as dataareaid, '    000010' as zipcode union all
select 'bmd' as dataareaid, '    000021' as zipcode union all
select 'bmd' as dataareaid, '    000031' as zipcode union all
select 'bmd' as dataareaid, '    000041' as zipcode
) as OrConds
on zipcodes.dataareaid = OrConds.dataareaid
and zipcodes.zipcode = OrConds.zipcode

Estimated cost: 6
True logical reads: 10 if cluster index, 15 if non clustered
True duration: 1 ms

select * from zipcodes where (dataareaid = 'bmd' and zipcode = '    000001')
union all
select * from zipcodes where (dataareaid = 'bmd' and zipcode = '    000010')
union all
select * from zipcodes where (dataareaid = 'bmd' and zipcode = '    000021')
union all
select * from zipcodes where (dataareaid = 'bmd' and zipcode = '    000031')
union all
select * from zipcodes where (dataareaid = 'bmd' and zipcode = '    000041')

Estimated cost: 3
True logical reads: 10 if cluster index, 15 if non clustered
True duration: 1 ms

So all were equally effective, but optimizer 'liked' the latter better !

All results on SS2000 SP2.

regards
jensk


Quote:> If both columns to be compared are always the same
> columns, then try:

> select *
> from BigTable
> join (
> select 'a' as type, b as value union all
> select 'c' as type, d as value union all
> select 'e' as type, f as value
> ) as OrConds
> on BigTable.type = OrConds.type
> and BigTable.value = OrConds.value