ufs logging kills linking performance

ufs logging kills linking performance

Post by John Fiebe » Thu, 10 Feb 2000 04:00:00



On the whole, ufs logging introduced in Solaris 7 is a Good Thing for
both performance and reliability.  But on a number of systems here,
performance for linking C programs is completely killed when logging is
enabled.

As a crude benchmark, I uses the time it takes to build Postfix (with
gcc 2.95.2).  Times are wall time; CPU time is roughly the same with and
without logging.  Compiling appears to chug along at the same rate
regardless, but linking will just hang for 15 or 20 seconds when logging
is enabled.  Two examples:

Compaq Pentium Pro 200MHz, 160MB RAM, 5-disk hardware RAID-5:
  without logging:  2m30s
  with logging:    19m20s

Sun Ultra 10 300MHz, 128MB RAM, IBM IDE disk:
  without logging:  2m53s
  with logging:     4m18s

Other differently configure system should the same symptoms with respect
to logging, but with varying degrees of slowdown as seen above.  (The
Compaq above is a bit of an outlier actually, most take 2 to 4 times as
long with logging enabled.) One system exhibits the problem only on SCSI
drives but others exhibit it on IDE drives as well.  On any given
system, the behavior is entirely predictable; turn logging on, linking
slows to a crawl, turn it off it goes fine.

Anybody else seen this?  I'm rather puzzled.

-john

 
 
 

ufs logging kills linking performance

Post by Juergen Kei » Thu, 10 Feb 2000 04:00:00



> On the whole, ufs logging introduced in Solaris 7 is a Good Thing for
> both performance and reliability.  But on a number of systems here,
> performance for linking C programs is completely killed when logging is
> enabled.
> ...  On any given
> system, the behavior is entirely predictable; turn logging on, linking
> slows to a crawl, turn it off it goes fine.

> Anybody else seen this?  I'm rather puzzled.

Yes I've observed the same problem with linking, especially on a
solaris 7/MU4 x86 system with ufs logging enabled.  Btw. while 'ld'
slowly runs, the system is 100% idle most of the time.  Upto now I was
unsure what the problem is but ufs logging is a reasonable
explanation, I think.

Perhaps the kernel runs out of transaction log space and waits for
some 'cleaner' thread to flush the logged operations to the file
system?  If that is true, maybe there's an wakeup for the 'cleaner'
thread missing when the transaction log gets full?

 
 
 

ufs logging kills linking performance

Post by Logan Sh » Thu, 10 Feb 2000 04:00:00


[ ufs logging slows down "ld". ]


Quote:>Perhaps the kernel runs out of transaction log space and waits for
>some 'cleaner' thread to flush the logged operations to the file
>system?  If that is true, maybe there's an wakeup for the 'cleaner'
>thread missing when the transaction log gets full?

Well, even if there's not a missing wakeup call, logging requires more
I/O than not logging, at least generally.  So there's more overhead.

By the way, you can see how big your log is by doing "fstyp -v" on the
filesystem.  On my filesystems, it's generally been something like 1
MB, if I remember right.

If you're really concerned about performance, having a log that's not
on the same disk will probably serve you better.  For that, you can use
DiskSuite (or, presumably, Veritas).  It is a LOT easier to configure
ufs logging, though.  :-)

By the way, am the only one who does lots of my small compiles on a
tmpfs filesystem?  :-)

  - Logan

 
 
 

ufs logging kills linking performance

Post by Kjetil Torgrim Homm » Fri, 11 Feb 2000 04:00:00


[Logan Shaw]

Quote:>   By the way, am the only one who does lots of my small compiles on
>   a tmpfs filesystem?  :-)

Why only small compiles?  :-)

VPATH is useful for this -- keep the source files safe and the object
files in /tmp.

Lots of swap is the best use of the huge harddisks (anything larger
than 2 GB :-) in today's workstations I can think of.

Kjetil T.

 
 
 

ufs logging kills linking performance

Post by Juergen Kei » Fri, 11 Feb 2000 04:00:00


lo...@cs.utexas.edu (Logan Shaw) writes:
> [ ufs logging slows down "ld". ]

> In article <wybt5qukd6....@leo.tools.de>, Juergen Keil  <j...@tools.de> wrote:
> >Perhaps the kernel runs out of transaction log space and waits for
> >some 'cleaner' thread to flush the logged operations to the file
> >system?  If that is true, maybe there's an wakeup for the 'cleaner'
> >thread missing when the transaction log gets full?

> Well, even if there's not a missing wakeup call, logging requires more
> I/O than not logging, at least generally.  So there's more overhead.

Hmm, generally thats true, but this cannot be the explanation why ufs
logging slows down "ld" so much!  There was an important piece of
information missing in my last post:  Not only is the cpu 100% idle
most of time while "ld" runs, at the same time there's also no (!) I/O
to the disks (over periods of ~ 10-15 seconds).

The program attached at the end of this posting ("logw.c") seems to
reproduce the problem with "ld" and ufs logging.  I think it uses a
similar technique that "ld" uses to write an output file: create the
output file and mmap it into virtual address space, then write the
contents of the output file directly to vm (and into the mapped file).

On a solaris 7 x86/MU4 system with ufs logging enabled and 256m
memory, writing a 100m file via mmap takes 2min:

% time logw 100m
0.07u 2.57s 1:59.83 2.2%

An iostat during that time shows three bursts of I/O (>1000kps) and
three periods of 10-15 seconds with no I/O (<100kps)

% iostat 5
      tty          fd0           sd0          sd16          sd32          cpu
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv  us sy wt id
   1   88   0   0    0   28   2   53    0   0    0    0   0    0  11  4  1 84
   0  231   0   0    0    1   2    8    0   0    0    0   0    0   1  3  0 96
logw 100m was started
   1  191   0   0    0    0   0    0    0   0    0    0   0    0   1  2  0 97
   1  157   0   0    0    0   0    0    0   0    0    0   0    0   1  1  0 98
   0  122   0   0    0   90   2  909    0   0    0    0   0    0   1  1  0 98
   0  130   0   0    0  261   9  243    0   0    0    0   0    0   1  1  0 99
   0  146   0   0    0    0   0    0    0   0    0    0   0    0   1  1  0 99
   0  182   0   0    0   28   3   12    0   0    0    0   0    0   1  5  0 95
   0  186   0   0    0   97  10   97    0   0    0    0   0    0   1 11  1 88
   0  137   0   0    0  829   5  470    0   0    0    0   0    0   1  1  0 98
   0  120   0   0    0  328   2 3077    0   0    0    0   0    0   1  2  0 97
   0  153   0   0    0  3207  33 1963    0   0    0    0   0    0   1  2  0 98
   0  163   0   0    0  286  14  319    0   0    0    0   0    0   1  1  0 98
   0  169   0   0    0   72   7   54    0   0    0    0   0    0   1  5  0 95
   0  145   0   0    0   95  12  564    0   0    0    0   0    0   1 15  0 84
   0  150   0   0    0  2614  33 4718    0   0    0    0   0    0   0  2  0 98
   0  130   0   0    0  3048  25 1173    0   0    0    0   0    0   0  2  0 98
   0  125   0   0    0    3   0   11    0   0    0    0   0    0   1  1  0 98
   0  122   0   0    0   24   2   32    0   0    0    0   0    0   1  1  0 98
   0  133   0   0    0   36   4  130    0   0    0    0   0    0   1  2  0 97
   0  237   0   0    0  132  17  422    0   0    0    0   0    0   1 19  0 80
   0  131   0   0    0  1877  34 5844    0   0    0    0   0    0   0  2  0 98
   0  129   0   0    0  4026  29 1361    0   0    0    0   0    0   1  1  0 98
   0  127   0   0    0   93   8  143    0   0    0    0   0    0   1  1  0 98
   0  131   0   0    0   27   3   41    0   0    0    0   0    0   1  1  0 98
   0  134   0   0    0   23   3   14    0   0    0    0   0    0   1  4  0 95
logw 100m ended (24 lines * 5 seconds == 120 seconds total run time)
   1  243   0   0    0  193  10  525    0   0    0    0   0    0   2 12  0 86
   0  130   0   0    0  3472  34 2149    0   0    0    0   0    0   0  1  0 98
   0  126   0   0    0  163   9  163    0   0    0    0   0    0   1  1  0 98
   0  122   0   0    0    6   1   21    0   0    0    0   0    0   1  1  0 99
   0  106   0   0    0    3   0   21    0   0    0    0   0    0   1  1  0 99
   0  123   0   0    0    0   1    9    0   0    0    0   0    0   1  2  0 97
   0  112   0   0    0    0   0    0    0   0    0    0   0    0   0  1  0 98
   0  143   0   0    0  547  80   13    0   0    0    0   0    0   0  2 40 57
   0  120   0   0    0    0   0    0    0   0    0    0   0    0   1  1  0 98
   0  112   0   0    0    0   0    0    0   0    0    0   0    0   1  1  0 98
^C

Writing the same 100m with standard write(2) system calls is *much*
faster (4x faster)!
Note: This is the same ufs logging filesystem!

% time dd if=/dev/zero of=junk bs=1k count=102400
102400+0 records in
102400+0 records out
0.35u 6.48s 0:30.42 22.4%

An iostat 5 while dd was running shows constant I/O, no pauses:

% iostat 5
      tty          fd0           sd0          sd16          sd32          cpu
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv  us sy wt id
   1   88   0   0    0   29   2   63    0   0    0    0   0    0  11  4  1 84
   0  208   0   0    0    0   0    0    0   0    0    0   0    0   1  1  0 97
   1  238   0   0    0  2946  27   63    0   0    0    0   0    0   2 18  1 80
   0  193   0   0    0  3540  33   67    0   0    0    0   0    0   1 24  0 74
   0  154   0   0    0  3419  28   71    0   0    0    0   0    0   1 24  0 75
   0  207   0   0    0  3466  28   71    0   0    0    0   0    0   2 23  0 75
   0  149   0   0    0  3363  32   66    0   0    0    0   0    0   2 23  0 75
   0  180   0   0    0  3480  28   66    0   0    0    0   0    0   3 22  0 75
   0  231   0   0    0  807   9   59    0   0    0    0   0    0   1  6  2 91

   1  157   0   0    0  600  81    8    0   0    0    0   0    0   1  2 27 69
   0  172   0   0    0    3   1    7    0   0    0    0   0    0   1  2  0 97
   0  122   0   0    0    1   0   32    0   0    0    0   0    0   1  1  0 98
  14  155   0   0    0   21   5   30    0   0    0    0   0    0   1  1  4 95

BTW: The effect cannot be ovserved on a system using disksuite logging.  On
such a system, "logw" is often faster as "dd" writing the 100m file.

==== logw.c ================8x=============================================
#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>

usage() {
        fprintf(stderr, "logw [size]\n");
        exit(1);

}

main(int argc, char **argv)
{
        int fd;
        off_t sz, off;
        char *end;
        char *data;

        if (argv[1]) {
                sz = strtol(argv[1], &end, 10);
                if (argv[1] == end) usage();
                switch (end[0]) {
                case 'g': sz *= 1024;
                case 'm': sz *= 1024;
                case 'k': sz *= 1024;
                }
        } else
                sz = 20*1024*1024;

        fd = open("junk", O_RDWR|O_CREAT|O_TRUNC, 0666);
        if (fd < 0) {
                perror("open junk");
                exit(1);
        }
        if (ftruncate(fd, sz) < 0) {
                perror("ftruncate");
                exit(1);
        }
        data = mmap(NULL, sz, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
        if (data == NULL) {
                perror("mmap");
                exit(1);
        }
        close(fd);

        /* if (madvise(data, sz, MADV_SEQUENTIAL) < 0) perror("madvise"); */

        for (off = 0; off < sz; off += 1024) {
                memcpy(data+off, &off, sizeof(off));
        }

        /* if (msync(data, sz, MS_SYNC) < 0) perror("msync"); */

        munmap(data, sz);

        exit(0);

}

=================================8x========================================
 
 
 

ufs logging kills linking performance

Post by John Fiebe » Fri, 11 Feb 2000 04:00:00



> [ ufs logging slows down "ld". ]


> >Perhaps the kernel runs out of transaction log space and waits for
> >some 'cleaner' thread to flush the logged operations to the file
> >system?  If that is true, maybe there's an wakeup for the 'cleaner'
> >thread missing when the transaction log gets full?

> Well, even if there's not a missing wakeup call, logging requires more
> I/O than not logging, at least generally.  So there's more overhead.

But by a factor of 2 to 10 when linking while essentially all other
types of disk activity see a performance boost?  Maybe some time with ld
and truss is in order to look at the i/o activity ld is generating.

Quote:> By the way, you can see how big your log is by doing "fstyp -v" on the
> filesystem.  On my filesystems, it's generally been something like 1
> MB, if I remember right.

Yes, that is about right.

-john

 
 
 

ufs logging kills linking performance

Post by Peter Ben » Fri, 11 Feb 2000 04:00:00





>> [ ufs logging slows down "ld". ]

>> Well, even if there's not a missing wakeup call, logging requires more
>> I/O than not logging, at least generally.  So there's more overhead.

>But by a factor of 2 to 10 when linking while essentially all other
>types of disk activity see a performance boost?  Maybe some time with ld
>and truss is in order to look at the i/o activity ld is generating.

You probably won't find that enlightening. ld makes use of mmap, even
for writing files (hence the SIGBUS when you run out of disk space) so
truss will miss the interesting data.

Peter

 
 
 

ufs logging kills linking performance

Post by Alan Stang » Fri, 11 Feb 2000 04:00:00




> > [ ufs logging slows down "ld". ]


> > >Perhaps the kernel runs out of transaction log space and waits for
> > >some 'cleaner' thread to flush the logged operations to the file
> > >system?  If that is true, maybe there's an wakeup for the 'cleaner'
> > >thread missing when the transaction log gets full?

> > Well, even if there's not a missing wakeup call, logging requires more
> > I/O than not logging, at least generally.  So there's more overhead.

> Hmm, generally thats true, but this cannot be the explanation why ufs
> logging slows down "ld" so much!  There was an important piece of
> information missing in my last post:  Not only is the cpu 100% idle
> most of time while "ld" runs, at the same time there's also no (!) I/O
> to the disks (over periods of ~ 10-15 seconds).

> The program attached at the end of this posting ("logw.c") seems to
> reproduce the problem with "ld" and ufs logging.  I think it uses a
> similar technique that "ld" uses to write an output file: create the
> output file and mmap it into virtual address space, then write the
> contents of the output file directly to vm (and into the mapped file).

<snipped some code and results>

Interesting.  Changing MAP_SHARED to MAP_PRIVATE chops the run time from many
minutes to many seconds.  A quick truss shows that ld is mapping the the new
executable file as MAP_SHARED.  And, to make matters even worse, there is a
memcntl call to MS_SYNC the writes.

This has the wonderful side effect of making ld fail when compiling in a tmpfs
file system if not enough swap space (or no swap space) is available (or
configured).    I seem to recall this coming up in the past in this news group as
well.

Over time, I'm becoming more and more convinced that Sun is purposely trying to
make Solaris run slow and to otherwise make it have some odd behavior.   See Bug
ID 4041435 for an interpositional library that overrides the msync call to avoid
the MS_SYNC.  You could add mmap to that to make all the mappings be
MAP_PRIVATE.  The only MAP_SHARED mapping is the last one...the one for the
executable.

I must be missing something here.  Can anyone tell me why the executable needs to
be MAP_SHARED?  What other process is going to be using this?

--

 
 
 

ufs logging kills linking performance

Post by Juergen Kei » Fri, 11 Feb 2000 04:00:00



> You probably won't find that enlightening. ld makes use of mmap, even
> for writing files (hence the SIGBUS when you run out of disk space) so
> truss will miss the interesting data.

Indeed, tracing "ld" system calls (truss -t ...) gives complete
silence.  But you can trace machine faults (truss -m ...) and you'll
notice a batch of page faults now and then.
 
 
 

ufs logging kills linking performance

Post by Juergen Kei » Fri, 11 Feb 2000 04:00:00



> <snipped some code and results>

> Interesting.  Changing MAP_SHARED to MAP_PRIVATE chops the run time from many
> minutes to many seconds.

Unfortunatelly, by not sharing your modifications, they never make it
to the file!  Run "od junk" on the output file and see if there are
any non-0 bytes in the out file.  MAP_PRIVATE in the logw sample is of
cause faster, because the kernel can just throw away the changes
(contained in anonymous pages backed by swap) that were made by the
process.
 
 
 

ufs logging kills linking performance

Post by Alan Stang » Fri, 11 Feb 2000 04:00:00




> > <snipped some code and results>

> > Interesting.  Changing MAP_SHARED to MAP_PRIVATE chops the run time from many
> > minutes to many seconds.

> Unfortunatelly, by not sharing your modifications, they never make it
> to the file!  Run "od junk" on the output file and see if there are
> any non-0 bytes in the out file.  MAP_PRIVATE in the logw sample is of
> cause faster, because the kernel can just throw away the changes
> (contained in anonymous pages backed by swap) that were made by the
> process.

Ugghh.  Yes, every once in a while I'm stupid....

--

 
 
 

ufs logging kills linking performance

Post by Alan Stang » Fri, 11 Feb 2000 04:00:00


Hello all,

another interesting tidbit to this is that, when using the logw script
(with MAP_SHARED  and replacing the memcpy with a single memset call)
the run time vary over a large range when running the code repeatedly on
an idle workstation and when writing a 10000K size file:

$ ptime a.out 10000k
real        7.448
user        0.040
sys         0.339
$ ptime a.out 10000k
real       30.081
user        0.039
sys         0.380
$ ptime a.out 10000k
real       26.010
user        0.039
sys         0.379
$ ptime a.out 10000k
real       26.844
user        0.039
sys         0.369
$ ptime a.out 10000k
real       18.786
user        0.039
sys         0.364
$ ptime a.out 10000k
real       10.441
user        0.040
sys         0.335
$ ptime a.out 10000k
real        0.871            This value is rather surprising....
user        0.040
sys         0.321
$ ptime a.out 10000k
real       12.751
user        0.040
sys         0.348
$ ptime a.out 10000k
real       23.800
user        0.041
sys         0.374
$ ptime a.out 10000k
real       16.717
user        0.041
sys         0.333

--

 
 
 

ufs logging kills linking performance

Post by Casper H.S. Dik - Network Security Engine » Fri, 11 Feb 2000 04:00:00


[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]


>You probably won't find that enlightening. ld makes use of mmap, even
>for writing files (hence the SIGBUS when you run out of disk space) so
>truss will miss the interesting data.

Truss -m pagefault might help.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

ufs logging kills linking performance

Post by Casper H.S. Dik - Network Security Engine » Fri, 11 Feb 2000 04:00:00


[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]


>Interesting.  Changing MAP_SHARED to MAP_PRIVATE chops the run time from many
>minutes to many seconds.  A quick truss shows that ld is mapping the the new
>executable file as MAP_SHARED.  And, to make matters even worse, there is a
>memcntl call to MS_SYNC the writes.

Well, you probably want to do that at the end.

Quote:>This has the wonderful side effect of making ld fail when compiling in a tmpfs
>file system if not enough swap space (or no swap space) is available (or
>configured).    I seem to recall this coming up in the past in this news group as
>well.

I that has been fixed both in the kernel (it's foolish to sync anonymous
memory to backing store) and in the linker.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

ufs logging kills linking performance

Post by Juergen Kei » Fri, 18 Feb 2000 04:00:00




> > [ ufs logging slows down "ld". ]


> > >Perhaps the kernel runs out of transaction log space and waits for
> > >some 'cleaner' thread to flush the logged operations to the file
> > >system?  If that is true, maybe there's an wakeup for the 'cleaner'
> > >thread missing when the transaction log gets full?

> > Well, even if there's not a missing wakeup call, logging requires more
> > I/O than not logging, at least generally.  So there's more overhead.
> The program attached at the end of this posting ("logw.c") seems to
> reproduce the problem with "ld" and ufs logging.  I think it uses a
> similar technique that "ld" uses to write an output file: create the
> output file and mmap it into virtual address space, then write the
> contents of the output file directly to vm (and into the mapped file).

> On a solaris 7 x86/MU4 system with ufs logging enabled and 256m
> memory, writing a 100m file via mmap takes 2min:

> % time logw 100m
> 0.07u 2.57s 1:59.83 2.2%

Some more datapoints for this problem:

1.

Today I repeated the "mmap write" test of a 100mbyte file (see logw
program in the previous posting) on the solaris x86 system after if
had been rebooted shortly before.  After the reboot there was
~150mbyte free memory.  With ufs-logging disabled it took ~2 seconds
to write out the 100mbyte file - with disk activity after the program
has completed.  With ufs-logging enabled, it took >8 minutes !

Disabling ufs access time updates (Casper suggested this elsewhere in
this thread) via the "noatime" ufs mount option made no difference.
The key is logging/nologging.

2.

Running "truss -m all" against the logw process showed batches of
pagefaults spread at time intervals of multiples of 5 seconds.  This
may point at the fsflush process.

% sysdef | egrep 'AUTOUP|FSFLUSH'
      30        auto update time limit in seconds (NAUTOUP)
       5        fsflush run rate (FSFLUSHR)

3.

With 150mbyte free memory, ufs-logging enabled, running "logw 100m" in
one window and "repeat 10000 sync" in another window brings down the
time to write the file from >8 minutes to ~25 seconds.

It seems as if ufs-logging somewhere in the kernel waits for the
modified pages produced by logw/ld to be written back to the file and
the speed at which that happens depends heavily on the fsflush/pageout
daemons.  pageout probably never has to run when there's lot of free
memory available.  And fsflush can take up to 30 seconds until a
certain modified page is flushed to the filesystem.

 
 
 

1. ufs logging kills linking performance

I just reproduced this problem . It works fine on both 2.7, 2.8 and
2.8.1.
However, when testing just a kernel patched with a fix for bug 4125102,
the performance problem could be readily seen. The 2.7 patch was
106541-10.
So that bug fix introduced the problem and it was fortuitously fixed
later
by bug fix 4272231. So I'm closing this bug as a duplicate of 4272231.
I don't know if a patch is available for 4272231. If not I guess an
escalation needs to be raised as this is a pretty serious performance
problem.

Now for the explanation of the problem and solution:
UFS logging transactions are grouped together to form a "moby
transaction",
which are composed of at least one synchronous transaction and as many
asynchronous transactions as will fit into the log. The sample program
and the linker command both memory map a file and touch a lot of pages,
each of which generates an asynchronous transaction. If the log gets
full
of these transactions without a synchronous one being generated, then
a empty synchronous one is created in order to flush the moby
transaction
to the log. This worked as described in 2.7 until a nasty locking
deadlock
was discovered (bug 4125102) which was fixed by essentially bouncing
any transactions that would block due to the log being full that could
be
holding the address space lock. Those bounced threads were returned to
the
kernel faulting code where they blocked for a clock tick (1/100sec) and
then retried. This is typically a very rare situation. However, these
bounced threads failed to create the dummy synchronous transaction to
complete the moby transaction. So they would awake from their delay
and retry the sequence only to be bounced again. This occurred at
a rate of 100 a second until another process happened to create a
synchronous transaction (eg the sync system call) and flush the log.

As mentioned in the 1st paragraph this has been fixed in bug fix
4272231.
This was done as a side effect of that fix. The fix was essentially to
kick off a separate thread to issue the synchronous transaction.

Neil

2. doom speedups

3. UFS logging VS Solstice DiskSuite's Trans metadevice "UFS logging"

4. SLIP and PPP problem

5. Solaris 8 ufs logging performance

6. get rid of MAJOR_NR

7. Solaris/UFS logging performance issue.

8. Is this a bug??

9. ufs log size - performance

10. UFS logging - how much space can be logged ?

11. UFS logging vs meta device logging

12. Sharing Logs with ODS + UFS Logging

13. how large is the log area on a UFS filesystem with logging.