Hi, I have a 2.4.16 based SUSE system, which is running the
default SUSE 2.4.16 SMP-64GB kernel build, with:
-756M Phy RAM
- SMP (2x PIII)
-Oracle I/O mostly through raw partitions on IDE drv.
-Oracle config'ed to buffer heavily.
Every night, an Oracle job runs for about two hours that hammers
the drives and memory quite thoroughly. Oracle finishes and the
box mostly quiesces at that point.
For the last several nights, I've noticed the swap usage steadily
go up. Hmm, thinking that Oracle was just holding buffers open,
I'd cycle all Oracle processes, but it still did not fix. Actually,
from top and ps, there really wasn't anything very memory intensive
on the box, and about 250M of non-buffer, non-cache memory was being
used by processes. What I mean by this figure is used memory -
buffers - cache, i.e. 111,176KB in the report below:
total used free shared buffers cached
Mem: 772920 660940 111980 0 6972 542792
-/+ buffers/cache: 111176 661744
Swap: 774640 0 774640
(In this report I see a lot of cache being used; I'm fine with that, I
just did a lot of file I/O).
So I got on my serial console, did an "init 1", which shut off all user
processes except for a single bash. I also shut off swap. I did a "free",
and hmm, i see the "used" figure (used - buffers - cache) is still over
200MB but the box ain't doing a darn thing! I wish I cut&paste that report.
The box was up about 10 days at this point. There was several 100MB used
for cache, but that doesn't bother me as much as the massive usage of
memory to run next to nothing.
Is this abnormal?
Well, went back to init 5, and rebuilt my kernel with the same config,
except I turned off 64GB support (PAE), as I don't need it with only
756M. Installed the new kernel, and rebooted...
Just to experiment, I rebooted into run level 1, and observed the
"used - buffer - cache" to be about 9MB. Nice and lean, sounds good.
I then bounced over to run level 5, built a kernel and hammered at
Oracle, then went back to run level 1, to observer the "used - buffer -
cache" figure climb several MB. Repeated sequences of the above steps
caused several MB increments in that figure, even though only one
shell was running.
Is the kernel somehow leaking memory? I understand using the buffer
cache aggressively as a policy decision, but reporting memory used
when in fact it can't be accounted for seems wrong.
is this a known prob or a prob with my understanding of what is going on