Hi there,
I have updated my kernel from 2.0.29 to 2.0.33 some time ago. First it
was fairly stable, but then it crashed more and more often. The first
crashes seemed to be in the filesystem. Then I got messages about
corrupted wait queues. Sometimes the kernel crashes fully and sometimes
only one task is killed. Up to now I have had some more crashes but I
couldn't see the kernel dumps, because the machine is mostly used remote
and the screen went into blank mode. I then updated my kernel to 2.0.34,
hoping for the bug being fixed, but it is instable, too. Now I have seen
one new crash.
It shows a stack failure in the zap_page_range function. This failure
has its root in a wrong page directory address in the memory management
structure (current->mm). It seems, as if the pgd entry in current->mm is
sometime overwritten. The crash occurs mainly, when the system is under
high load. Maybe this is also the reason for the other crashs. Perhaps
the corrupted address is more or less random and can lie inside the
kernel space, such not causing an exception. This may cause the kernel
to overwrite any data in the kernel space, such as wait queues and
inodes!
The call to zap_page_range comes from a do_unmap(), coming from a
munmap() syscall.
I have these problems on two Linux-2.0.33(now 2.0.34)-boxes.
Is there anyone who knows this problems and can help me or perhaps
supply me with his information(crash dumps,...)?
Anyway, does anyone know if an update to 2.1.0 may help and, if so,
where to find an "old" 2.1.0 version? Most mirrors are deleting the
"older versions" of 2.1.x. Or perhaps anyone knows a newer stable
version of 2.1.x?
Thanx in advance,
Ralf Gerlich