Does lots of iowait really mean I'm I/O bound?

Does lots of iowait really mean I'm I/O bound?

Post by Mathew Kirs » Sat, 23 Oct 2004 04:27:08



The system here is a V880 with 4 CPUs, 4GB RAM. It's connected two 8
DLT7000 drives and 4 LTO1 drives. Each LTO has its own channel on an
HVD SCSI interface, and the DLTs are configured two per HVD SCSI
interface. We have four dual-channel HVD SCSI cards in the box. The
drives and interfaces are split evenly across the system's two PCI
buses. This system is also connected to the SAN via dual 2GB Emulex
fiber channel interfaces.

This monitoring utilty, "foglight," is showing a high amount of CPU
wait time. top confirms that this is iowait time. My boss insists that
there's a performance problem.

My observation is that iowait is highest when the tape traffic is
lowest. That is, when the system is feeding 4 tape drives or fewer,
the iowait is high. CPU utilization goes way up, and iowait goes way
down when the system is feeding all twelve drives.

Please just verify my theory on the subject:

If a hotrod system like a V880 is feeding data to tape drives faster
than they can take it, the CPU time that would be dedicated to feeding
that data is shown as iowait by the system, correct?

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by Darren Dunha » Sat, 23 Oct 2004 07:57:02



> This monitoring utilty, "foglight," is showing a high amount of CPU
> wait time. top confirms that this is iowait time. My boss insists that
> there's a performance problem.

What is his basis for that insistence?  I hope it's not the CPU iowait
time.

Quote:> My observation is that iowait is highest when the tape traffic is
> lowest. That is, when the system is feeding 4 tape drives or fewer,
> the iowait is high. CPU utilization goes way up, and iowait goes way
> down when the system is feeding all twelve drives.

Right.  iowait is a subset of idle time, so it cannot be high when
user/sys cpu is high.

Quote:> Please just verify my theory on the subject:
> If a hotrod system like a V880 is feeding data to tape drives faster
> than they can take it, the CPU time that would be dedicated to feeding
> that data is shown as iowait by the system, correct?

Let's say instead that "any time the CPU is not busy AND there is
outstanding IO to a drive, then iowait will be calculated".  There is no
CPU time "dedicated" to feeding the data.  Thats why it's idle.

As a test, take an otherwise idle system, then run 'mt -f <drive>
offline'.  Observe iowait during the time it takes to eject the drive.
Hypothesize from that whether the system performance has actually
changed significantly.

--

Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by Jonathan Adam » Sat, 23 Oct 2004 08:35:29





> > This monitoring utilty, "foglight," is showing a high amount of CPU
> > wait time. top confirms that this is iowait time. My boss insists that
> > there's a performance problem.

> What is his basis for that insistence?  I hope it's not the CPU iowait
> time.

> > My observation is that iowait is highest when the tape traffic is
> > lowest. That is, when the system is feeding 4 tape drives or fewer,
> > the iowait is high. CPU utilization goes way up, and iowait goes way
> > down when the system is feeding all twelve drives.

> Right.  iowait is a subset of idle time, so it cannot be high when
> user/sys cpu is high.

> > Please just verify my theory on the subject:

> > If a hotrod system like a V880 is feeding data to tape drives faster
> > than they can take it, the CPU time that would be dedicated to feeding
> > that data is shown as iowait by the system, correct?

> Let's say instead that "any time the CPU is not busy AND there is
> outstanding IO to a drive, then iowait will be calculated".  There is no
> CPU time "dedicated" to feeding the data.  Thats why it's idle.

> As a test, take an otherwise idle system, then run 'mt -f <drive>
> offline'.  Observe iowait during the time it takes to eject the drive.
> Hypothesize from that whether the system performance has actually
> changed significantly.

Note that in Solaris 10, the CPU I/O wait time will always be zero:

4518644 I/O wait statistic is still misleading and should be dropped

since it tends to just confuse the issue.

Cheers,
- jonathan

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by Mathew Kirs » Sat, 23 Oct 2004 21:45:20



> What is his basis for that insistence?  I hope it's not the CPU iowait
> time.

Yes, it is. We get phenominal, better-than-advertised throughput on
the tape drives. He insists that because iowait is high sometimes, but
not during certain times of the day, that there is something wrong.

I've been looking into this. When iowait is high, the system is
writing to no more than four or five tape drives. When iowait is low,
the system is writing to several tape drives. It peaks out at zero
iowait and 100% CPU utilization when all twelve drives are busy.

Quote:> Let's say instead that "any time the CPU is not busy AND there is
> outstanding IO to a drive, then iowait will be calculated".  There is no
> CPU time "dedicated" to feeding the data.  Thats why it's idle.

Right, that's why I said "would be dedicated." If the CPU were doing
nothing other than IO, then the amount of CPU time shown as iowait
would be doing IO, IFF the drive could take it.

Quote:> As a test, take an otherwise idle system, then run 'mt -f <drive>
> offline'.  Observe iowait during the time it takes to eject the drive.
> Hypothesize from that whether the system performance has actually
> changed significantly.

I know exactly what you're talking about. The key is coming up with
the "magic bullet" that will convince my boss... He's not buying what
we've discussed so far.

Maybe I should just recommend we upgrade it to Solaris 10 ASAP... "Hey
boss, Solaris 10 will fix the iowait problem... Sun guarantees ZERO
iowait in Solaris 10."

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by Darren Dunha » Sun, 24 Oct 2004 00:41:34



>> Let's say instead that "any time the CPU is not busy AND there is
>> outstanding IO to a drive, then iowait will be calculated".  There is no
>> CPU time "dedicated" to feeding the data.  Thats why it's idle.
> Right, that's why I said "would be dedicated." If the CPU were doing
> nothing other than IO, then the amount of CPU time shown as iowait
> would be doing IO, IFF the drive could take it.

*blink*.  Uhh, I don't follow that description.  Could you rephrase?

Just because you speed up the drive doesn't mean that the *time* spent
waiting would be translated into time spent doing something else.

Quote:> I know exactly what you're talking about. The key is coming up with
> the "magic bullet" that will convince my boss... He's not buying what
> we've discussed so far.

Sorry.  

--

Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by Darren Dunha » Sun, 24 Oct 2004 00:42:15



> Note that in Solaris 10, the CPU I/O wait time will always be zero:
> 4518644 I/O wait statistic is still misleading and should be dropped
> since it tends to just confuse the issue.

Ugh.  I know it's confusing, but I'd rather have the data available than
not available.  

--

Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by Jonathan Adam » Sun, 24 Oct 2004 01:21:37





> > Note that in Solaris 10, the CPU I/O wait time will always be zero:

> > 4518644 I/O wait statistic is still misleading and should be dropped

> > since it tends to just confuse the issue.

> Ugh.  I know it's confusing, but I'd rather have the data available than
> not available.

The dtrace i/o provider can give you detailed information about what
I/Os are being waited for, which is much more useful than the "CPU idle
time where at least one thread is waiting for an I/O to complete, and
started it's wait on this CPU".

Cheers,
- jonathan

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by Darren Dunha » Sun, 24 Oct 2004 03:47:34




>> Ugh.  I know it's confusing, but I'd rather have the data available than
>> not available.
> The dtrace i/o provider can give you detailed information about what
> I/Os are being waited for, which is much more useful than the "CPU idle
> time where at least one thread is waiting for an I/O to complete, and
> started it's wait on this CPU".

Ahh, true.  I don't think of dtrace much yet.  Wonderful.

--

Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by Chris Thomps » Sun, 24 Oct 2004 06:26:54




>I know exactly what you're talking about. The key is coming up with
>the "magic bullet" that will convince my boss... He's not buying what
>we've discussed so far.

mv boss.new boss

Chris Thompson
Email: cet1 [at] cam.ac.uk

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by Beard » Sun, 24 Oct 2004 14:41:37





>>I know exactly what you're talking about. The key is coming up with
>>the "magic bullet" that will convince my boss... He's not buying what
>>we've discussed so far.

> mv boss.new boss

find /company -name "decent.boss" -exec mv {} ./boss \;

Usually zero results; thus ./boss not overwritten :-(

 
 
 

Does lots of iowait really mean I'm I/O bound?

Post by APA » Sun, 24 Oct 2004 19:27:18




>>Note that in Solaris 10, the CPU I/O wait time will always be zero:

>>4518644 I/O wait statistic is still misleading and should be dropped

>>since it tends to just confuse the issue.

> Ugh.  I know it's confusing, but I'd rather have the data available than
> not available.  

Folks, I wrote an infodoc on this at one point. Have a look at infodoc
75659 -
http://sunsolve.sun.com/search/document.do?assetkey=1-9-75659-1&searc...

alan.
--
Alan Hargreaves - http://blogs.sun.com/tpenta
Senior Technical Support Specialist/VOSJEC Engineer
Product Technical Support (APAC)
Sun Microsystems