Kernel buffer glut

Kernel buffer glut

Post by Phil Howa » Sat, 02 Nov 1996 04:00:00



I'm trying to write a LOT (1 gig) of data to disk AFAP.  The disk is not
busy for any other I/O at all.

What Linux is doing as this data is being written is filling up RAM with
all the buffers.  That would be good if the amount of buffering was
reasonable.  Instead if immediately fills up all free and stealable pages
from RAM.  Then the process doing the writing finally blocks with RAM
nearly full.  Of course now pages are being written, but some of the pages
being written are those of other processes that really don't need to be
written out because they are going to have to be read back in real soon.

I don't need 24 meg of buffer space for one disk, even if I were writing
it at random, though in this case I am writing sequentially.  1 or 2 meg
might be fine.

What I am looking for is some way to control the amount of I/O buffer pages
used in RAM, either total RAM usable for buffers, or per process.

One solution is to make the program write a little then fsync() or close()
and re- open().  But that just makes things work on a stop-and-go basis,
which is also not a good solution.

What I'd most like to have is to have the process doing the writing to
block until buffer space is again available within the quota for buffers.
Thus I could sustain a continuous stream of data flow, and still have some
buffered data to allow reasonable seek-elevator optimizing in random cases.

I'm looking for what kernel tuning settings might be usable for this (I did
not find any that obviously stood out in the source), or maybe ioctl() or
fcntl() operations that I could at least do this setting per device or fd.

If nothing else can be done, I'm also devising some bizarre methods to get
around it.  One is to use a rotating set of child processes that each do
some I/O then fsync().  I just hope the fsync() on a fd open to several
processes still allows the other processes to keep running, so I can keep
at least a certain small amount of data buffered to keep the disk going
at full speed.

I want this to also apply to tape.  I have both IDE and SCSI disk, and SCSI
tape.  One tape drive is a QIC-150 and I like to keep it fully streaming,
and when Linux gets into thrashing with the overbuffering, it often fails
to sustain I/O.

There are other problems associated with "buffer glut", so I am getting
really anxious to find a solution to this.

--
Phil Howard KA9WGN   +-------------------------------------------------------+
Linux Consultant     |  Linux installation, configuration, administration,   |
Milepost Services    |  monitoring, maintenance, and diagnostic services.    |

 
 
 

Kernel buffer glut

Post by Andrew E. Miles » Tue, 05 Nov 1996 04:00:00


: There are other problems associated with "buffer glut", so I am getting
: really anxious to find a solution to this.

Read the manual page for kswapd (update). You can change how buffers
are expired.

Note: Some early kernels (pre-2.0.0 I think) have a problem with buffers.
      These bugs have of course been fixed.

--

Linux Plug-and-Play Kernel Project http://www.redhat.com/linux-info/pnp/
XFree86 Matrox Team http://www.bf.rmit.edu.au/~ajv/xf86-matrox.html

 
 
 

Kernel buffer glut

Post by bill davids » Fri, 08 Nov 1996 04:00:00





| : There are other problems associated with "buffer glut", so I am getting
| : really anxious to find a solution to this.
|
| Read the manual page for kswapd (update). You can change how buffers
| are expired.

I think you mean "how often," unless there's a newer page and
totally diferently semantics. I would liek to be able to tune:
1 - the interval at which pages are checked
2 - the age at which a page is queued to be swapped
        The sum of these is age of the oldest unqueued page

3 - the max space used to queue any single device
        to prevent the buffer glut in cases where it becomes a
        problem.

I would like to see dirty pages written ASAP to avoid backlog, and I
think I know how to do it (but it's a bit of work). If the kernel
kept multiple queues for each device, pages being written by sync(),
fsync(), or because of age would go on the "normal" queue, which
works as it does today. However, every dirty page could go on the
"if you get to it" queue, which would be processed after the normal
queue were empty. It would help keep memory clean.

The advantage to doing this is that it avoids scanning memory, pages
are already on the queue. The disadvantage is that the spare time
queu should probably be ordered by age, while the normal queue
should be ordered such that elevator access (lowest to highest
cylinders) takes place for performance.

However it's done, there are advantage to starting writes to the
device as soon as there are dirty pages.
--

  What do you mean I shouldn't do thing like that at my age?
  At my age if you don't do things like that you might die of natural causes!

 
 
 

Kernel buffer glut

Post by Phil Howa » Mon, 11 Nov 1996 04:00:00






| | : There are other problems associated with "buffer glut", so I am getting
| | : really anxious to find a solution to this.
| |
| | Read the manual page for kswapd (update). You can change how buffers
| | are expired.
|
| I think you mean "how often," unless there's a newer page and
| totally diferently semantics. I would liek to be able to tune:
| 1 - the interval at which pages are checked
| 2 - the age at which a page is queued to be swapped
|       The sum of these is age of the oldest unqueued page
|
| 3 - the max space used to queue any single device
|       to prevent the buffer glut in cases where it becomes a
|       problem.

Number 3 is the one that is currently weakest (apparently non-existant)
in Linux.  I'd like to be able to specify it finer:

3.1 - max space overall for I/O buffers (as opposed to swap pages)
      this could be expressed as a percentage of that portion of RAM
      that could be used for buffer and swap.  Linux currently seems
      to operate as though it were 100%.  I'd personally set this to
      about 40% on my 32meg system.

3.2 - max space per device (common default)
      this could be expressed as a percentage of the space assigned
      by 3.1 above, and 100% would only allow one device to hog all
      the buffer space (but not the swap space).

3.3 - max space per device (individually by device)
      same as 3.2 but separately settable override on each device

Also, a setting like 3.1 should also apply for swap space itself.  These
two settings MUST add up to at least 100% (else you aren't using some of
the RAM) and may add up to 200% (in which case all of RAM could be used
for either I/O buffer or process swapping).

| I would like to see dirty pages written ASAP to avoid backlog, and I
| think I know how to do it (but it's a bit of work). If the kernel
| kept multiple queues for each device, pages being written by sync(),
| fsync(), or because of age would go on the "normal" queue, which
| works as it does today. However, every dirty page could go on the
| "if you get to it" queue, which would be processed after the normal
| queue were empty. It would help keep memory clean.

I agree.  I don't mind a 1 or 2 second delay to allow a little bit of
elevator optimization right at the start.  But when you have a lot of
buffers in the queue, you certainly have the opportunity to apply the
elevator optimizing.

| The advantage to doing this is that it avoids scanning memory, pages
| are already on the queue. The disadvantage is that the spare time
| queu should probably be ordered by age, while the normal queue
| should be ordered such that elevator access (lowest to highest
| cylinders) takes place for performance.

Agreed.  Does Linux do one way (low to high then jump back to low) or
two way (low to high, then high to low) elevator?

| However it's done, there are advantage to starting writes to the
| device as soon as there are dirty pages.

And advantages to blocking the process that has a massive amount to write
to pace it down to the physical I/O bandwidth well before the RAM is all
taken up by buffers.  Elevator optimizing does NOT need 24 meg to do its
job.  This might all be moot on an 8 meg machine, but right now I won't
get much advantage at all to upgrade from my current 32meg to 64 or 128.

And as mentioned before, I have the same problem with Solaris (2.4 and 2.5.1).
So it obviously has not been all that well addressed in the industry and
may actually be a tricky thing to do.

--
Phil Howard KA9WGN   +-------------------------------------------------------+
Linux Consultant     |  Linux installation, configuration, administration,   |
Milepost Services    |  monitoring, maintenance, and diagnostic services.    |

 
 
 

1. Update Kernel, Xfree86,Glut,for Nvidia

I have downloaded all the required files for the Source Kernel, Xfree86,
and Glut so I can get X Windows or KDE running. This took all day on a
56K connection. That is how patient I am. This is the first time I have
sought support.
I can read Basic C++ and compile code. Just to give an idea of my
knwoledge if you want to give directions. Yes I can probably understand
and follow.

I have Suse 6.3. Suse's information gives directions for update from 6.4
and upwards.

I have been going back and forth between Matt Welsch's Book, O'Reily,
and the Suse book. I need a good book that covers the basics of updating
a system regardless of distribution.

Or to put another way. Are the Distribution so different that this
change cannot be done without getting specific information pertaining to
Suse 6.3

Thanks

Howard Schnirman

2. Sendmail-Netscape error?

3. 2.5.46: buffer layer error at fs/buffer.c:399

4. Partitions

5. buffer layer error at fs/buffer.c:1166

6. TEAC 4X CDROM - Supported by???

7. buffer layer error at fs/buffer.c:127; problems with via686a sensor

8. hft terminal on AIX 4

9. pipe/buffer-command for buffering large amounts of I/O

10. doing buffered RPC via clnt_call: how to do a buffer flush?

11. Line buffer and Full buffer

12. 2.5.55 - buffer layer error at fs/buffer.c:1182

13. (2.5.23) buffer layer error at buffer.c:2326