OLTP IO perf prob, file descriptor symptoms

OLTP IO perf prob, file descriptor symptoms

Post by Billy Verreynn » Thu, 12 Feb 1998 04:00:00




>We're showing lots of IO wait on our
>Oracle database, but the disks are not stressed.

We've had (and still have on occasion) the exact same problem - large number
of complex queries and the nodes run idle with 1% busy time and 99% spent
waiting for i/o.

Quote:>Anyway, I'm hoping someone out there can answer these
>questions:

>1. Why does Oracle have so many file descriptors open?

No idea.

Quote:>2. Is there any correlation between the number of open
>   file descriptors and Oracle performance?

I doubt it as the only thing that should be an overhead with a large number
of open file descriptors is memory resources. And a file descriptor is only
a couple of bytes anyway so IMO it should not cause such a large memory
overhead to significantly impact performance.

Quote:>3. It looks to me like our IO is bound by contention for
>   some sort of memory resource, not by disk or CPU capacity
>   (we have 10 CPU's).  How can I determine what my IO is waiting for?

We discovered that the high % waiting for i/o is actually just that -
waiting for i/o! :-) An i/o call,  simply takes more time  (think of all the
hardware "calls" involved in getting the heads to a specific position on the
disk, read data, and then returned it to the operating system) than most
other system calls. After the i/o call is executed, the process waits for
the data to be returned. A few milliseconds of wait time per i/o for a
couple of Oracle processes on Unix will show high % waiting for i/o, very
little cpu time and no significant impact on the % disk busy time.

In our case we had a process that was trying to process 100+ million rows
using an index range scan (CBO can be very vindictive at times). So this
process was spending all it's time reading an index block, then a couple of
data blocks, and then again an index block and so on. Processing (reading)
more than 100MB (not even to mention 1GB) of data from disk will be time
consuming - an i/o call is limited to the hardware, bus speeds, disk caches
etc. etc. Throwing more memory and more CPUs at it will not get it to
process that amount of data faster.

The solution in our case was to use the parallel query (pq) feature. As the
query requires to process such a large volume of data, let the pq slaves do
the i/o in parallel. Thus, instead of having a single process reading, let's
say a 100 million rows, you now have 20 processes each reading 5 million
rows. This will significantly decrease the total time to get that data from
disk  - the difference between a run that took 40+ hours and never completed
to a run that now takes less than an hour to complete.

How to determine what the i/o is waiting for? We first looked at the
v$session_event and v$session_wait to see what Oracle is trying to do.
Secondly we truss'ed the actual Oracle Unix process doing the query to see
what system calls it was doing.

The lesson we learned after losing a lot of hair and sleep are to increase
disk thru-put using multiple processes to read the data. At the same time,
also make sure that the CBO (Cost Based Optimiser) has accurate stats to
decide how many rows each of the pq processes must read from disk (skewed or
incorrect stats may cause the first pq process to read 80% of the data and
the last pq process with nothing to do). Also, the join method used is also
important. A nested loop join can cause huge i/o overheads that just worsen
the problem (job never completed), where a hash join for example will cause
very little addition i/o overheads (job completed in 40 minutes). So use
"explain plan" to see what Oracle intends to do with the query.

As a rule of thumb - if a process need to access more than 5% - 10% data
from a VLT, discard indexes and use a full tablescan with parallel process.

OLTP "rules" really sucks when it comes to OLAP type processing or VLD's.
Indexes drool, full table scans rules! :-)

regards,
Billy

 
 
 

OLTP IO perf prob, file descriptor symptoms

Post by Casper H.S. Dik - Network Security Engine » Thu, 12 Feb 1998 04:00:00


[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]


>We've had (and still have on occasion) the exact same problem - large number
>of complex queries and the nodes run idle with 1% busy time and 99% spent
>waiting for i/o.

Actually, wait I/O is not "waiting for I/O" in the sense of bus waiting.

All idle CPU time is counted as wait I/O if there are outstanding I/O
requests to be fullfilled; as such, wait I/O is a pretty useless measure.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.