*very* serious kernel bug (2.0.30)

*very* serious kernel bug (2.0.30)

Post by James William » Sun, 14 Sep 1997 04:00:00



I've discovered a very serious bug in the 2.0.30 kernel that will
allow any user to crash the system.  I've tested this on two different
Linux systems, both running Redhat 4.2, and each with a custom built
kernel (2.0.30).  The problem seems to happen when one process is
creating a lot of locks, and another views the /proc/locks file.  To
see the problem for yourself, compile the attached C program, and
run it in an empty directory.  It will create a new file and put a lock
on it every 1/10th of a second.  Now, while it's running, switch to
another window or virtual console and enter "cat /proc/locks".  This
will show the active locks on the system.  Continue to run the
"cat /proc/locks" several times.  Pretty soon you will start to
have problems ranging from segmentation faults, kernel messages (the
messages I get say "Unable to handle kernel paging request" along
with a dump of system registers, stack, etc), to a complete system
lockup.  What I need to know is if there is a patch available to fix
this.  I've seen people who say they are running pre-2.0.31 releases.
Will these pre-2.0.31 kernels fix this problem?  If so, where can I
find a pre-2.0.31 kernel?  I'm not willing to use a development
kernel (2.1.xx) due to possible instability (the machine I'm
running needs to be as stable as possible).  I need any help I can
get in this matter.  Thanks in advance.
--------------------------------------------------------------------
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

main()
{
        int fd, i;
        char filename[256];

        for (i=0;; i++) {
                sprintf(filename, "%i", i);
                fd = open(filename, O_CREAT);
                flock(fd, LOCK_EX);
                usleep(100000);
        }

Quote:}

--------------------------------------------------------------------

--
           /   AIX     | James Williams

PA_RISC __\ \  Sco     |
  Sparc \ \\// OSF     |
PowerPC  \///  Linux   |
 RS6000  ///\  FreeBSD |
  Alpha //\\_\ Ultrix  |
   MIPS  \ \   Xenix   |
MC68000   \/   Solaris |
          /    HPUX    |

 
 
 

*very* serious kernel bug (2.0.30)

Post by Henrik Storn » Mon, 15 Sep 1997 04:00:00



Quote:>I've discovered a very serious bug in the 2.0.30 kernel that will
>allow any user to crash the system.  I've tested this on two different
>Linux systems, both running Redhat 4.2, and each with a custom built
>kernel (2.0.30).  The problem seems to happen when one process is
>creating a lot of locks, and another views the /proc/locks file.

Tried this with the pre-9 version of 2.0.31. Doesn't do any harm on
my system (up-to-date RedHat 4.2).

Quote:>Will these pre-2.0.31 kernels fix this problem?  If so, where can I
>find a pre-2.0.31 kernel?

ftp://linux.kernel.org/pub/linux/kernel/testing/

--
Henrik Storner                               http://www.image.dk/~storner/
"The POP3 server service depends on the SMTP server service, which
 failed to start because of the following error:
 The operation completed successfully." -Windows NT Server v3.51

 
 
 

*very* serious kernel bug (2.0.30)

Post by James William » Mon, 15 Sep 1997 04:00:00


: Tried this with the pre-9 version of 2.0.31. Doesn't do any harm on
: my system (up-to-date RedHat 4.2).

: ftp://linux.kernel.org/pub/linux/kernel/testing/

Thanks.  I'll give it a try.

--
           /   AIX     | James Williams

PA_RISC __\ \  Sco     |
  Sparc \ \\// OSF     |
PowerPC  \///  Linux   |
 RS6000  ///\  FreeBSD |
  Alpha //\\_\ Ultrix  |
   MIPS  \ \   Xenix   |
MC68000   \/   Solaris |
          /    HPUX    |

 
 
 

*very* serious kernel bug (2.0.30)

Post by James William » Mon, 15 Sep 1997 04:00:00


: Tried this with the pre-9 version of 2.0.31. Doesn't do any harm on
: my system (up-to-date RedHat 4.2).

Awesome!  I installed the patch on my system and the problem went away.
Thanks again!

--
           /   AIX     | James Williams

PA_RISC __\ \  Sco     |
  Sparc \ \\// OSF     |
PowerPC  \///  Linux   |
 RS6000  ///\  FreeBSD |
  Alpha //\\_\ Ultrix  |
   MIPS  \ \   Xenix   |
MC68000   \/   Solaris |
          /    HPUX    |

 
 
 

*very* serious kernel bug (2.0.30)

Post by Renato Moutinho Silv » Thu, 18 Sep 1997 04:00:00




> : Tried this with the pre-9 version of 2.0.31. Doesn't do any harm on
> : my system (up-to-date RedHat 4.2).

> Awesome!  I installed the patch on my system and the problem went away.
> Thanks again!

  Cool !!! That's the Linux way !! Things always get fixed..

All the best,

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

AIX/SunOS/Solaris/Linux   Workstations Manager
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Powered by,

L  I  N  U  X  -   THE CHOICE OF A GNU GENERATION

" The sotware said it requires Windows 95 or better,
  so I installed Linux "

 
 
 

1. Is this the 2.0.30 kernel buffer bug?

After any amount of occasional strangeness with my 2.0.30 (Dell PPro 200,
32MB etc.) system, I have finally seen something clear enough to describe.
It sounds like the much described buffer problem with 2.0.30, but I offer a
description either for additional information, of perhaps for advice if
this has been seen and fixed before.

After doing quite a lot of work with the machine, I left a couple of things
running while I fetched a coffee.  When I returned some 10 mins later, the
X server (Xfree86 3.3.1) had crashed and I was back at a login prompt.
When attempting to login, I successfully entered username and password, but
was then returned to the login prompt.  Thinking that I might have some
disk corruption affecting my .profile et al. files, I successfully logged
in as root on another virtual console, and renamed them all.  Still no
login was possible to my own username on any VC.  This really left only the
shell as culprit, since the init/login stuff appeared to be working
normally.  It so happens that I keep two shells - /bin/sh, a link to
/bin/bash 2.01.1, for root and startup, and /usr/bin/bash (also 2.01.1) for
everything else.  So I mv'd /usr/bin/bash to something else and cp'd
/bin/sh to /usr/bin/bash.  Success!  a normal login resumed.  So what was
wrong with /usr/bin/bash?  I cmp -l'd the original and its working
replacement, and they were the same.  Ownership and permissions were also
identical.  So I removed the replacement shell and mv'd the shell that
failed back to /usr/bin/bash.  Fine - the login failure returned too.  So I
suspected inode association here.  I mv'd the /usr/bin/bash to something
else and cp'd that back to /usr/bin/bash.  Success!  This shell worked
correctly.

Now I can only explain all this by some kernel cache being corrupt or in
some way marked unusable for the file associated the inode my /usr/bin/bash
shell was on.  Hence why I suspect the 2.0.30 buffer bug.

I have had odd segv's (and occasional machine hangs) relatively frequently
in the past, but the best I have been able to do in debugging the core is
to get to a jump table in the startup code, which I assume is dynamic
linkage vectoring.  I have never been able to determine just which routine
the vector should have been addressing from there - the address is always
outside the process' address space.  However, if a dynamic linkage fetch
had also been perturbed by a buffer cache error, then this would explain
both the segv and my inability to debug beyond that point.

If there is any information that I might be able to gather on a repeat
event (should it happen!) that would be useful in diagnosing this problem
further, please let me know and I'll see what I can do.
--

2. Why don't I get SIGPIPE when writing to broken sockets?

3. Bug fix kernel 2.0.30 for alpha (net/ipv4/ip_fw.c)

4. Changing other users password via npasswd

5. 2.0.30 KERNEL BUGS FOUND

6. How to know some more about an open TCP port?

7. HELP: Compiling kernel problem. Kernel 2.0.30

8. PPP - Slackware Connecting But that's it?

9. Slakware /W kernel 2.0.30 to RH5 W Kernel 2.0.32

10. Serious Kernel Level Bug,

11. Serious bug in kernel 2.0.24 - 2.0.25?

12. Serious Kernel Level Bug,

13. serious kernel bug in 0.96b