Messy SQL performance question (NOT in SELECT clause)

Messy SQL performance question (NOT in SELECT clause)

Post by Roger Loe » Thu, 16 Apr 1998 04:00:00



I have an application that does queries on a 400 million (yes) table.  There
are two queries run sequentially.  The program is looking for "candidates"
within clusters of rows.  The table is clustered on the primary key, which
is also the key used in the first of the two queries.  A typical first query
may return as many as 100,000 rows.  The second query is only used when the
first query does not find a suitable candidate.  It uses a separate index to
select rows that were NOT selected in the first query.

The form of the first query is "SELECT /*+ INDEX primary-index */
column-names from table-name where PRIMARY-KEY = :value"

The second query is of the form "SELECT /*+ INDEX secondary-index */
column-names from table-name where SECONDARY-KEY = : value2 AND NOT
PRIMARY-KEY = :value"

The purpose of the NOT is to avoid retrieving all of the rows that were
already read in the first query.

The problem: most Oracle tuning books indicate that any use of a NOT
condition will result in a full table scan.  That's rather a problem when
there are 400 million rows.  I'm just trying to avoid the overhead of
selecting all those rows and testing them internally for duplication with
the first query.  Seems to me that, at least theoretically, a join of the
two indices identifies those records to NOT select, but I do appreciate the
concept of a negative join.

Are the tuning manuals largely correct, i.e., does this form result in a
full table scan?  Can anyone suggest an alternate way to reduce the volume
from the second query?

Thanks,

Rog

--

 
 
 

Messy SQL performance question (NOT in SELECT clause)

Post by Mick » Fri, 17 Apr 1998 04:00:00


If you are using a not in with a sub query, then try the HASH_AJ hint. This
will do some nice stuff and convert your not in to an inline view.
Alternatively you could do it as an inline view yourself and hand tune from
there.

Regards,
Michael Ryan

 
 
 

Messy SQL performance question (NOT in SELECT clause)

Post by Rod Cordere » Fri, 17 Apr 1998 04:00:00


Hi Roger,

you don't say what the size of data set would be from the second query
without the NOT clause. Although I don't think a NOT is appropriate
here.

If the data set of 2ndkey=:value2 is no more than say 100-200,000 then
you're query becomes.

select column_names
from table_name
where this_column = :value_1
UNION ALL
select column_names
from table_name
where other_column = :value_2
and this_column    != :value_1

the second query is serviced by your secondary index.

If your secondary index also contains the primary_key attribute then the
search is satisfied without table access other than the retrieval of the
columns of the eventual data set.

I would doubt that primary-index would contend in the second half of the
union, but if it did then add a function around this_column to cause it
to ignore primary-index.

If the data set for 2ndkey=:value2 is huge then you may have to think
again, but
a concatenated index would still limit the set extraction  to indexes.

cheers

Rod
Lane Associates

http://www.Lane-Associates.com


> I have an application that does queries on a 400 million (yes) table.  There
> are two queries run sequentially.  The program is looking for "candidates"
> within clusters of rows.  The table is clustered on the primary key, which
> is also the key used in the first of the two queries.  A typical first query
> may return as many as 100,000 rows.  The second query is only used when the
> first query does not find a suitable candidate.  It uses a separate index to
> select rows that were NOT selected in the first query.

> The form of the first query is "SELECT /*+ INDEX primary-index */
> column-names from table-name where PRIMARY-KEY = :value"

> The second query is of the form "SELECT /*+ INDEX secondary-index */
> column-names from table-name where SECONDARY-KEY = : value2 AND NOT
> PRIMARY-KEY = :value"

> The purpose of the NOT is to avoid retrieving all of the rows that were
> already read in the first query.

> The problem: most Oracle tuning books indicate that any use of a NOT
> condition will result in a full table scan.  That's rather a problem when
> there are 400 million rows.  I'm just trying to avoid the overhead of
> selecting all those rows and testing them internally for duplication with
> the first query.  Seems to me that, at least theoretically, a join of the
> two indices identifies those records to NOT select, but I do appreciate the
> concept of a negative join.

> Are the tuning manuals largely correct, i.e., does this form result in a
> full table scan?  Can anyone suggest an alternate way to reduce the volume
> from the second query?

> Thanks,

> Rog

> --