Hi,
last week the netware server in our office crashed after 42 days
uptime. I thought with my linux server box at home I could beat
this anytime, especially with the later kernels.
Alas, today after 16 days the system locked up completely. ping from
another box showed no reaction so I had to hit the Red Button :-(
At the time of the crash the system was heavily loaded with povray
rendering a complex image. A lot of daemons ran in the background
and xearth was running in the root window. Several modules were
loaded (at least the DOSEMU modules, CDROMs, network card and ISDN).
I was typing something into a shell command line when it happened.
One hour earlier I had a big problem with a 'make' command running
wild. After it had spawned 122 copies of itself, the system ran
out of virtual memory and I could not even start 'ls' or 'ps'
because they could not load libc. (Here's some good advise: Always
have statically linked versions of the fileutils around!)
I solved the problem by using a statically linked 'mv' to move
/usr/bin/make out of the way. Can it be that this has made the
system unstable?
I definitely think it wasn't a hardware failure. It couldn't be
a power failure either because the system has an UPS. The CPU fan
(AMD486DX-100) is also working correctly.
What can be done to find the cause? My logfiles don't contain
anything. I think I'll just run povray for extended periods to
test the possibility of a CPU bug. I can also write a program that
exhausts all virtual memory, to see what happens then.
Hans-Joachim
--
Uncle Ed's Rule of Thumb: Never use your thumb for a rule.
You'll either hit it with a hammer or get a splinter in it.