ext3 performance bottleneck as the number of spindles gets la rge

ext3 performance bottleneck as the number of spindles gets la rge

Post by Griffiths, Richard » Wed, 26 Jun 2002 08:00:10

I ran your write-and-fsync program.  Here are the results:
#cntrlrs x #drives
2x2 avg = 25.04 MB/s        aggregate = 100 MB/s
2x4 avg =  9.17 MB/s        aggregate = 110 MB/s
4x2 avg = 14.55 MB/s        aggregate = 116.4 MB/s
4x6 avg =  4.94 MB/s        aggregate = 118.6 MB/s

Your program only addresses large I/O (1MB) against a fairly large file
(4GB). We did that as well with Bonnie++ (2GB file 1MB I/O requests).  The
results without the fsync option ran about 94 - 100 MB/s.  Our concern was
creating a more real world mix of I/O.  How well does the system scale
against a variety of I/O request sizes on various size files.  Where we saw
the worst overall scaling was with 8K requests.


> ...
> >And please tell us some more details regarding the performance
> >I assume that you mean that the IO rate per disk slows as more
> >disks are added to an adapter?  Or does the total throughput through
> >the adapter fall as more disks are added?

> No, the IO block write throughput for the system goes down as drives are
> added under this work load.  We measure the system throughput not the
> per drive throughput, but one could infer the per drive throughput by
> dividing.

> Running bonnie++ on with 300MB files doing 8Kb sequential writes we get
> the following system wide throughput as a function of the number of
> drives attached and by number of addapters.

> One addapter
> 1 drive per addapter    127,702KB/Sec
> 2 drives per addapter  93,283 KB/Sec
> 6 drives per addapter   85,626 KB/Sec

127 megabytes/sec to a single disk?  Either that's a very
fast disk, or you're using very small bytes :)

Quote:> 2 addapters
> 1 drive per addapter    92,095 KB/Sec
> 2 drives per addapter  110,956 KB/Sec
> 6 drives per addapter   106,883 KB/Sec

> 4 addapters
> 1 drive per addapter    121,125 KB/Sec
> 2 drives per addapter   117,575 KB/Sec
> 6 drives per addapter   116,570 KB/Sec

Possibly what is happening here is that a significant amount
of dirty data is being left in memory and is escaping the
measurement period.   When you run the test against more disks,
the *total* amount of dirty memory is increased, so the kernel
is forced to perform more writeback within the measurement period.

So with two filesystems, you're actually performing more I/O.

You need to either ensure that all I/O is occurring *within the
measurement interval*, or make the test write so much data (wrt
main memory size) that any leftover unwritten stuff is insignificant.

bonnie++ is too complex for this work.  Suggest you use
which will just write and fsync a file.  Time how long that
takes.  Or you could experiment with bonnie++'s fsync option.

My suggestion is to work with this workload:

for i in /mnt/1 /mnt/2 /mnt/3 /mnt/4 ...
        write-and-fsync $i/foo 4000 &

which will write a 4 gig file to each disk.  This will defeat
any caching effects and is just a way simpler workload, which
will allow you to test one thing in isolation.

So anyway.  All this possibly explains the "negative scalability"
in the single-adapter case.  For four adapters with one disk on
each, 120 megs/sec seems reasonable, assuming the sustained
write bandwidth of a single disk is 30 megs/sec.

For four adapters, six disks on each you should be doing better.
Something does appear to be wrong there.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/