I have an application that does queries on a 400 million (yes) table. There
are two queries run sequentially. The program is looking for "candidates"
within clusters of rows. The table is clustered on the primary key, which
is also the key used in the first of the two queries. A typical first query
may return as many as 100,000 rows. The second query is only used when the
first query does not find a suitable candidate. It uses a separate index to
select rows that were NOT selected in the first query.
The form of the first query is "SELECT /*+ INDEX primary-index */
column-names from table-name where PRIMARY-KEY = :value"
The second query is of the form "SELECT /*+ INDEX secondary-index */
column-names from table-name where SECONDARY-KEY = : value2 AND NOT
PRIMARY-KEY = :value"
The purpose of the NOT is to avoid retrieving all of the rows that were
already read in the first query.
The problem: most Oracle tuning books indicate that any use of a NOT
condition will result in a full table scan. That's rather a problem when
there are 400 million rows. I'm just trying to avoid the overhead of
selecting all those rows and testing them internally for duplication with
the first query. Seems to me that, at least theoretically, a join of the
two indices identifies those records to NOT select, but I do appreciate the
concept of a negative join.
Are the tuning manuals largely correct, i.e., does this form result in a
full table scan? Can anyone suggest an alternate way to reduce the volume
from the second query?