Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Matthew Kirkwoo » Thu, 28 Mar 2002 23:00:12



Hi,

A while ago, I did some longish runs of OSDB (osdb.sf.net)
against PostgreSQL 7.2.  All runs were on kernel 2.5.6 + the
dc395x driver and the futexes patch.  I'd have included
reiserfs too, but in 2.5.6 it seemed to oops on mount.  2.5.7
doesn't boot for me, but I'll run these again when a more
interesting kernel appears.

Hardware is: 2 x P3-450, 384Mb, 3 x 9Gb Quantum disks on
internal aic7xxx (new driver).  Except for a "vmstat 1", the
system was otherwise unused during the tests.  There was no
other mounted filesystem on the disk with the test partition.
The numbers seem pretty consistent -- if they're more than 5%
different, that's probably a valid comparision (no, I'm not a
statistician and can't justify that).

The scripts I used are available on request, but they do
roughly:

        stop postgres
        umount
        mkfs
        mount
        create postgres data directories
        start postgres (incl. creating postgres database)
        "osdb-pg --datadir /scratch/data-40mb/ --short"

"Tuning" key:
"dd"  -- default PG, default FS opts
"dn"  -- default PG, "noatime"
"bn"  -- big PG buffers, "noatime"

                PostgreSQL
        tuning? single  ir      mx-ir   oltp    mixed-oltp
                (sec)   (tps)   (sec)   (tps)   (sec)
ext2    dd      1304.72 66.64   214.25  188.50  230.55
        dn      1288.31 65.93   209.57  234.08  213.75
        bn      1283.50 77.90   1867.71 192.43  226.77

ext3    dd      1303.84 66.87   212.49  66.06   361.04
        dn      1288.03 64.62   209.27  111.41  278.54
        bn      1285.32 65.98   1996.41 90.05   307.79

ext3-wb dn      1291.68 66.06   209.94  138.25  242.28
        bn      1287.31 98.42   2149.38 125.13  236.02

jfs     dd      1308.97 66.82   212.59  117.28  273.08
        dn      1288.60 65.08   211.56  116.18  218.22
        bn      1279.89 81.00   2059.26 114.20  225.56

minix   dd      1305.26 67.38   207.74  193.90  228.81
        dn      1331.27 67.14   210.07  223.70  214.33
        bn      1299.24 89.58   1988.31 231.17  231.17

My conclusions:

1. I'll have to spend more time learning to tune postgres,
   but clearly something went wrong there -- the
   "agg_simple_report" test accounted for almost all of the
   differences.

2. "noatime" is very useful switch for these circumstances.

3. The journalled filesystems do have measurable overhead
   for this workload.

Questions:

1. Is there anything else I should try in the way of fs
   options, etc?

2. What does jfs do in the way of data journalling?  Is it
   "ordered" or "writeback", in ext3-speak?  (I assume
   fully journalled data would give much worse performance.)

Cheers,
Matthew.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Andi Klee » Thu, 28 Mar 2002 23:20:05



>            PostgreSQL
>    tuning? single  ir      mx-ir   oltp    mixed-oltp
>            (sec)   (tps)   (sec)   (tps)   (sec)
> ext2       dd      1304.72 66.64   214.25  188.50  230.55
>    dn      1288.31 65.93   209.57  234.08  213.75
>    bn      1283.50 77.90   1867.71 192.43  226.77

> ext3       dd      1303.84 66.87   212.49  66.06   361.04
>    dn      1288.03 64.62   209.27  111.41  278.54
>    bn      1285.32 65.98   1996.41 90.05   307.79

This is ext3 with ordered data?

Quote:> minix      dd      1305.26 67.38   207.74  193.90  228.81
>    dn      1331.27 67.14   210.07  223.70  214.33
>    bn      1299.24 89.58   1988.31 231.17  231.17


Any chance to test XFS too?

Quote:> 3. The journalled filesystems do have measurable overhead
>    for this workload.

Normally (non data journaling, noatime) journaling fs shouldn't have any
overhead for database load, because database files should be preallocated
and the database should do direct IO in/out the preallocated buffers
with the FS never doing any metadata writes, except for occassional inode
updates for mtime depending on what sync mode that DB uses (hmm, I guess a
nomtime or verylazymtime or alwaysasyncmtime mount option could be helpful
for that)

That's the theory, but doesn't seem to be the case in your test. I guess
your test is not very realistic then.

Quote:> 2. What does jfs do in the way of data journalling?  Is it
>    "ordered" or "writeback", in ext3-speak?  (I assume
>    fully journalled data would give much worse performance.)

Kind of ordered I believe.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Florin Andre » Thu, 28 Mar 2002 23:20:10



> 3. The journalled filesystems do have measurable overhead
>    for this workload.

Can you repeat the tests with XFS too?

In my tests, it did the best for database-type workloads (and generally,
for large files with multiple access).

--
Florin Andrei

"Sorry judge, we would like to publish the file formats, but the data is
not stored in files. It is stored in a database that is an indivisible
part of the operating system." - a potential future Microsoft excuse

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Matthew Kirkwoo » Thu, 28 Mar 2002 23:50:09



Quote:> > ext3 dd       1303.84 66.87   212.49  66.06   361.04
> >       dn      1288.03 64.62   209.27  111.41  278.54
> >       bn      1285.32 65.98   1996.41 90.05   307.79

> This is ext3 with ordered data?

Yep.  Everything is default unless otherwise stated.

> > minix dd      1305.26 67.38   207.74  193.90  228.81
> >       dn      1331.27 67.14   210.07  223.70  214.33
> >       bn      1299.24 89.58   1988.31 231.17  231.17



Yeah, I thought it was a little odd.  Postgres does so much
fsync()ing that I thought it may just have been that the lower
overhead won out over ext2's cleverer layout.  All the I/O was
basically fsync-driven, so this test was only about write
performance.

Quote:> Any chance to test XFS too?

Sure.  I'll try to build a more interesting kernel sometime
this week.  ext2 with delalloc might be fun, too.

Do you know of any simple patch or patches which might get
reiserfs working on 2.5.6?

Quote:> > 3. The journalled filesystems do have measurable overhead
> >    for this workload.

> Normally (non data journaling, noatime) journaling fs shouldn't have
> any overhead for database load, because database files should be
> preallocated and the database should do direct IO in/out the
> preallocated buffers with the FS never doing any metadata writes,
> except for occassional inode updates for mtime depending on what sync
> mode that DB uses (hmm, I guess a nomtime or verylazymtime or
> alwaysasyncmtime mount option could be helpful for that)

Postgres doesn't pre-allocate datafiles.  They reckon it's not
their job to implement a filesystem, and I'm inclined to agree.
They do prefer fdatasync on datafiles and (I think) O_DATASYNC
for their journal files where available, but I haven't checked
that my build is doing that.

Quote:> That's the theory, but doesn't seem to be the case in your test. I
> guess your test is not very realistic then.

Or your assumptions about DB vs filesystems are not valid in
this case.

Quote:> > 2. What does jfs do in the way of data journalling?  Is it
> >    "ordered" or "writeback", in ext3-speak?  (I assume
> >    fully journalled data would give much worse performance.)

> Kind of ordered I believe.

OK, ta.  So it probably does something right that ext3
doesn't?  (Or has rather weaker semantics, of course.)

Matthew.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Michael Alan Dorma » Fri, 29 Mar 2002 00:40:11



> Postgres doesn't pre-allocate datafiles.  

I haven't recieved your original message, so I don't know what version
of PostgreSQL you're using, but I believe it is pertinent given that
versions >= 7.2 (and perhaps >= 7.1) *do* pre-allocate WAL logs, which
is where most of the action is.

It might be that in this situation you might benefit from any
reduction in FS overhead even if it means a reduction in features
because WAL is going to dramatically change the way disk access
happens.

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Andrew Morto » Fri, 29 Mar 2002 03:00:11



> ...
> Yeah, I thought it was a little odd.  Postgres does so much
> fsync()ing that I thought it may just have been that the lower
> overhead won out over ext2's cleverer layout.  All the I/O was
> basically fsync-driven, so this test was only about write
> performance.

For fsync-intensive loads ext3's best mode is generally
data=journal.  That way, an fsync is satisfied by a nice
single linear write to the journal.

With a high volume of data you'll quickly exhaust the
journal space so it would also be beneficial to create
a monster journal with, say, mke2fs -J 400.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Andreas Dilge » Fri, 29 Mar 2002 03:10:10



Quote:> Postgres doesn't pre-allocate datafiles.  They reckon it's not
> their job to implement a filesystem, and I'm inclined to agree.
> They do prefer fdatasync on datafiles and (I think) O_DATASYNC
> for their journal files where available, but I haven't checked
> that my build is doing that.

If the I/O is normally sync driven, you should consider testing ext3
with "data=journal".  While this seems counterintuitive because it is
writing the data to disk twice, it can often be faster in real-world
"bursty" environments because the sync I/O goes to the journal in one
contiguous write, and it can then be written to the rest of the fs
asynchronously safely.  You can also set up an external journal device
so that the journal is on another disk and avoid seeking between the
journal and the rest of the filesystem.

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Matthew Kirkwoo » Fri, 29 Mar 2002 09:10:13



> > Yeah, I thought it was a little odd.  Postgres does so much
> > fsync()ing that I thought it may just have been that the lower
> > overhead won out over ext2's cleverer layout.  All the I/O was
> > basically fsync-driven, so this test was only about write
> > performance.

> For fsync-intensive loads ext3's best mode is generally
> data=journal.  That way, an fsync is satisfied by a nice
> single linear write to the journal.

Here we are.  This is with just a 200Mb journal (the partition
is only a little over 1Gb, and the datafiles grow fairly big,
so I didn't brave making it any bigger).

        tuning? single  ir      mx-ir   oltp    mixed-oltp
                (sec)   (tps)   (sec)   (tps)   (sec)
ext3    bn      1285.32 65.98   1996.41 90.05   307.79
ext3-wb bn      1287.31 98.42   2149.38 125.13  236.02
ext3-jd bn      1306.90 72.07   1813.54 125.15  305.27

The I/O load should be almost exclusively fsync-driven writes,
so I'm not sure how to account for the fact that the OLTP and
OLTP + misc (mostly read) activity give different numbers.

I'll try to find time to run these again tomorrow to convince
myself that all is sane, but these numbers are usually pretty
stable.

Matthew.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Matthew Kirkwoo » Fri, 29 Mar 2002 09:20:08



> If the I/O is normally sync driven, you should consider testing ext3
> with "data=journal".  While this seems counterintuitive because it is
> writing the data to disk twice, it can often be faster in real-world
> "bursty" environments because the sync I/O goes to the journal in one
> contiguous write, and it can then be written to the rest of the fs
> asynchronously safely.

Good point (and partially borne out by my new numbers).

Quote:> You can also set up an external journal device so that the journal is
> on another disk and avoid seeking between the journal and the rest of
> the filesystem.

Good idea.  If I had only a disks - a slow one and a fast one,
how should they be configured?  (Or might this be another area
worthy of testing?  The tradeoffs can go both ways -- the slow
disk might seem better for the async writes, but it'll also be
worse at seeking, so perhaps might be more appropriate for the
journal disk?)

Matthew.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Andrew Morto » Fri, 29 Mar 2002 09:40:10




> > > Yeah, I thought it was a little odd.  Postgres does so much
> > > fsync()ing that I thought it may just have been that the lower
> > > overhead won out over ext2's cleverer layout.  All the I/O was
> > > basically fsync-driven, so this test was only about write
> > > performance.

> > For fsync-intensive loads ext3's best mode is generally
> > data=journal.  That way, an fsync is satisfied by a nice
> > single linear write to the journal.

> Here we are.  This is with just a 200Mb journal (the partition
> is only a little over 1Gb, and the datafiles grow fairly big,
> so I didn't brave making it any bigger).

>         tuning? single  ir      mx-ir   oltp    mixed-oltp
>                 (sec)   (tps)   (sec)   (tps)   (sec)
> ext3    bn      1285.32 65.98   1996.41 90.05   307.79
> ext3-wb bn      1287.31 98.42   2149.38 125.13  236.02
> ext3-jd bn      1306.90 72.07   1813.54 125.15  305.27

Oh well.

It sounds like a useful and valid workload to optimise
for.  So I'll take you up on the offer of those scripts,
please.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Matthew Kirkwoo » Fri, 29 Mar 2002 09:50:11



> >         tuning? single  ir      mx-ir   oltp    mixed-oltp
> >                 (sec)   (tps)   (sec)   (tps)   (sec)
> > ext3    bn      1285.32 65.98   1996.41 90.05   307.79
> > ext3-wb bn      1287.31 98.42   2149.38 125.13  236.02
> > ext3-jd bn      1306.90 72.07   1813.54 125.15  305.27

> Oh well.

Sometimes better, sometimes worse.  I'll kick another run
off tonight, to check that the numbers aren't too far off.

Quote:> It sounds like a useful and valid workload to optimise
> for.  So I'll take you up on the offer of those scripts,
> please.

My scripts are roughly the appended, and:

grep -E '(agg_simple|Bench|crossSe|Mixed|"Sin)' dbb-tuned.out | \
                sed 's/^crossSection/cS/'

I've been too lazy so far to automate the "make it into a
table" bit, particularly because I quite like watching the
results come in :)

Cheers,
Matthew.

#!perl -w
use strict;

my $PART = '/dev/sdb6';
my $FORCEOPTS = 'noatime';
my $DEFOPTS = undef;
my $DEBUG = 1;
my $DEBUGONLY = 0;




my %filesystems = (
        minix   => { mkfs => [ qw(/root/mkfs.minix -v) ], },
        ext2    => {},
        ext3    => {},
        'ext3-wb' => {       type => 'ext3', mountopts => 'data=writeback', },
        'ext3-jd' => {       mkfs => [ qw(mkfs.ext3 -J size=200 )],
                        type => 'ext3', mountopts => 'data=journal', },
        jfs     => { mkfs => [ qw(mkfs.jfs -q) ], },
        reiser  => { type => 'reiserfs', },
);

my %dbs = (
        mysql   => { mntpoint => '/var/lib/mysql', osdb => 'osdb-my', },
        postgresql => { mntpoint => '/var/lib/pgsql', osdb => 'osdb-pg',
                        init => \&pg_init, },
);

runit('umount', $PART);


        my $dbopts = $dbs{$db};
        my $mntpoint = $dbopts->{mntpoint} or die "$db has no mntpoint\n";
        my $osdb = $dbopts->{osdb} or die "$db has no \"osdb\"\n";


                my $opts = $filesystems{$fs};
                print "Benchmark for ", $db, " on ", $fs, "\n\n";

                my $fstype = $opts->{type} || $fs;
                my $mkfs = $opts->{mkfs} || [ qw(mkfs -t), $fstype ];
                print "making fs\n";

                print "\n\n";

                print "mounting fs\n";
                my $opt = $opts->{mountopts} || $DEFOPTS;
                $opt = [$opt] if $opt && ! ref $opt;


                $opt = ['-o', $opt] if $opt;

                                        or die "can't mount $fstype\n";
                print "\n\n";

                print "Starting ", $db, "\n";
                if ($dbopts->{init}) {
                        &{$dbopts->{init}}($dbopts, $opts);
                } else {
                        runit('/sbin/service', $db, 'start');
                }
                print "\n\n";

                print "Running test\n";

                print "\n\n";

                print "Stopping ", $db, "\n";
                runit('/sbin/service', $db, 'stop');
                sleep(2);
                print "\n\n";

                print "Umounting\n";
                runit('umount', $PART) or die "can't umount $fstype\n";
                print "\n\n";

                print "\n\n";
                print "\n\n";
        }

Quote:}

exit;

sub pg_init {
        my $dbopts = shift;
        my $opts = shift;
        my $mp = $dbopts->{mntpoint};




#       runit('cp', '/etc/postgresql.conf', $dirs[0]);
        runit('/sbin/service', 'postgresql', 'start');
        sleep(2);
        runit('sudo', '-u', 'postgres', 'createuser', '-a', '-d', 'root');

Quote:}

sub runit {


Quote:}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Mike Fedy » Fri, 29 Mar 2002 11:30:07




> > Postgres doesn't pre-allocate datafiles.  They reckon it's not
> > their job to implement a filesystem, and I'm inclined to agree.
> > They do prefer fdatasync on datafiles and (I think) O_DATASYNC
> > for their journal files where available, but I haven't checked
> > that my build is doing that.

> If the I/O is normally sync driven, you should consider testing ext3
> with "data=journal".  While this seems counterintuitive because it is
> writing the data to disk twice, it can often be faster in real-world
> "bursty" environments because the sync I/O goes to the journal in one
> contiguous write, and it can then be written to the rest of the fs
> asynchronously safely.

Don't forget to have enough extra memory so that it can have time to do
those async writes later.

When is ext3 going to get high and low watermarks?

Currently it hits a (50%?) high usage level and then sync writes the entire
journal contents. :(  Has that changed?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Filesystem benchmarks: ext2 vs ext3 vs jfs vs minix

Post by Matthew Kirkwoo » Fri, 29 Mar 2002 20:20:07



> I'll try to find time to run these again tomorrow to convince
> myself that all is sane, but these numbers are usually pretty
> stable.

Here's another run, with noatime on, and default postgres
parameters.

        tuning? single  ir      mx-ir   oltp    mixed-oltp
                (sec)   (tps)   (sec)   (tps)   (sec)
ext3     dn     1296.30 66.34   207.59  69.19   318.26
ext3-wb  dn     1286.38 66.27   212.48  135.48  229.74
ext3-jd  dn     1293.08 68.72   209.33  113.40  283.97

Looks like I'll have to invest some time in tuning postgres
a little better before the filesystem becomes more of a
bottleneck.

Matthew.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. Linux vs OS2 vs NT vs Win95 vs Multics vs PDP11 vs BSD geeks

        Every machine and operating system has got its useful
purpose...

        I see no point in argueing with people which OS is better, and
which is worse, and what will survive and what wont...

        The bottom line is obviously the best OS is the one that make
the end user most productive.    Ive used quite a variety of software
from intel, ibm, MS, sun, GNU, DEC/compaq, etc,   and everything OS
has got its UPz and DOWnz, so depending on what you want to do with it
yer machine, probably determines what OS you run.

        So lets cut to the chase -  OS bashing is a waste of time,
and most of the time I'd say the person putting it down just hasn't
seen that particular OS's potential,  or should I say speciality....

      Hell,  Plan 9 has even got some interesting features.. <snicker>

       And all PC users know,  that no matter what use on a day to day
basis on the PC, that one day you will need to boot good ole ancient
DOS to do something...

2. compiling sound into kernel ???

3. BENCHMARKS - SCO vs Solaris vs Unixware vs etc...

4. Using Ensoniq Soundscape under Linux?

5. FileSystem XFS vs RiserFS vs ext3

6. shell script: open application in certain workspace

7. Perfomance: tar vs ftp vs rsync vs cp vs ?

8. {RFC][PATCH] MCA sysfs conversion

9. Slackware vs SuSE vs Debian vs Redhat vs ....

10. DOS vs. Windows vs. Mac vs. Unix vs. NS

11. KDE vs. Openlook vs. Xfree86 vs. MetroX vs. CDE

12. Redhat vs Debian vs Yggdrasil vs Caldera vs ...

13. ext3 vs resiserfs vs xfs