2.4.20-pre7-ac2 - kernel BUG at spinlock.h:186!

2.4.20-pre7-ac2 - kernel BUG at spinlock.h:186!

Post by Jeff Dik » Tue, 22 Oct 2002 05:40:07



We had a process die on a UML server with the oops in $SUBJECT.  

uname -a says

        Linux zaphod.stearns.org 2.4.20-pre7-ac2 #1 SMP Wed Sep 18 18:06:15 EDT 2002 i686 unknown

ksymoops says

Oct 19 15:31:01 zaphod kernel: kernel BUG at /usr/src/linux-2.4.20/include/asm/spinlock.h:186!
Oct 19 15:31:01 zaphod kernel: invalid operand: 0000
Oct 19 15:31:01 zaphod kernel: CPU:    0
Oct 19 15:31:02 zaphod kernel: EIP:    0010:[<c0324e00>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 19 15:31:02 zaphod kernel: EFLAGS: 00010217
Oct 19 15:31:02 zaphod kernel: eax: ecae9348   ebx: eca69f04   ecx: ecae9040   edx: 00000000
Oct 19 15:31:02 zaphod kernel: esi: eca69f4c   edi: 00000000   ebp: eca69edc   esp: eca69ea0
Oct 19 15:31:02 zaphod kernel: ds: 0018   es: 0018   ss: 0018
Oct 19 15:31:02 zaphod kernel: Process jcorrodo (pid: 17283, stackpage=eca69000)
Oct 19 15:31:02 zaphod kernel: Stack: eca69ed8 c0108941 a024fc70 a024fcc8 00000000 00000000 ecae9040 ffffff95
Oct 19 15:31:02 zaphod kernel:        eca69fc4 f1183148 0000000e f1183148 eca69f04 eca69f4c 00000000 eca69f30
Oct 19 15:31:02 zaphod kernel:        c02b36c7 ec84cd00 eca69f4c 00000001 eca69f04 eca68000 40000000 eca68000
Oct 19 15:31:02 zaphod kernel: Call Trace:    [<c0108941>] [<c02b36c7>] [<c02b38b5>] [<c01488c6>] [<c01092cb>]
Oct 19 15:31:02 zaphod kernel: Code: 0f 0b ba 00 e0 6b 33 c0 f0 83 28 01 0f 88 60 12 00 00 8b 45

Quote:>>EIP; c0324e00 <unix_stream_sendmsg+70/370>   <=====

Trace; c0108941 <setup_frame+111/210>
Trace; c02b36c7 <sock_sendmsg+67/90>
Trace; c02b38b5 <sock_write+95/a0>
Trace; c01488c6 <sys_write+96/190>
Trace; c01092cb <system_call+33/38>
Code;  c0324e00 <unix_stream_sendmsg+70/370>
00000000 <_EIP>:
Code;  c0324e00 <unix_stream_sendmsg+70/370>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0324e02 <unix_stream_sendmsg+72/370>
   2:   ba 00 e0 6b 33            mov    $0x336be000,%edx
Code;  c0324e07 <unix_stream_sendmsg+77/370>
   7:   c0                        (bad)  
Code;  c0324e08 <unix_stream_sendmsg+78/370>
   8:   f0 83 28 01               lock subl $0x1,(%eax)
Code;  c0324e0c <unix_stream_sendmsg+7c/370>
   c:   0f 88 60 12 00 00         js     1272 <_EIP+0x1272> c0326072 <.text.lock.af_unix+18d/27b>
Code;  c0324e12 <unix_stream_sendmsg+82/370>
  12:   8b 45 00                  mov    0x0(%ebp),%eax

Contact me or Bill Stearns (wstearns at pobox dot com) if anyone needs more
information.

                                Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. Process hangs in 2.4.19, RH7.latest, and 2.4.20-pre7-ac2

I (and other people) have seen process hangs on stock 2.4.19, 2.4.20-pre7-ac2,
and (iirc) the latest RH 7.x kernel.  Any process that does the moral
equivalent of ps hangs.  The machine quickly becomes unusable, and needs to
be crashed.

It's been seen most often under heavy UML load.  I've seen it most often
doing UML development inside UML (stock 2.4.19).  It's been seen on
2.4.20-pre7-ac2 on a UML server.  However, I have had it happen with no
UMLs in sight.

We finally got sysrq information on this.  The hung processes all look like
this:
        Proc;  ps
        >>EIP; e352bef4 <_end+2307b320/386c042c>   <=====
        Trace; c032a955 <rwsem_down_read_failed+195/1c0>
        Trace; c016e3c0 <.text.lock.array+73/123>
        Trace; c016b340 <proc_info_read+50/110>
        Trace; c0148736 <sys_read+96/190>
        Trace; c0147fb3 <sys_open+53/b0>
        Trace; c01092cb <system_call+33/38>

The lock in question is the mmap_sem being acquired in proc_pid_stat.  There
should be a sleeping process which is holding the semaphore, but I haven't
spotted it among the multitudes that were running at the time.

The full ksymoops-ed sysrq-t output is available at
        http://www.stearns.org/slartibartfast/sym_stacks

I'm not including it here because it's too large.

There should be one process which started this by grabbing a mm_sem and
sleeping forever and I would think its stack would be different from all
the others.  There are a few processes whose deepest IP are unique:

Proc;  grep
Trace; c0118120 <do_page_fault+0/438>

Proc;  killall
Trace; c0130b72 <__vma_link+62/c0>
Trace; c032a955 <rwsem_down_read_failed+195/1c0>
Trace; c016e3c0 <.text.lock.array+73/123>
Trace; c016b340 <proc_info_read+50/110>

Proc;  init
Trace; c013f5e3 <__get_free_pages+13/30>

None of these look like to culprits.  init is probably innocent, the grep
was processing the output of a hung ps, so it was too late, and the killall
is itself hung.

I'd appreciate any clues about what's going on here.  If anyone needs more
info than what's in the sysrq output at the URL above, contact me or Bill
Stearns (wstearns at pobox dot com).

                                Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. Swap size / Memory upgrade

3. Compile problem w/ 2.4.20-pre7-ac2

4. Floppyless CD-ROM installation

5. alan@redhat.com: Linux 2.4.20-pre7-ac2 - How's the IDE thing coming?

6. Exit statu and loop question

7. 2.4.20-pre7-ac2 ide-scsi

8. sunisdn S Bus cards

9. Linux 2.4.20-pre7-ac2

10. Ooops with 2.4.20-pre7-ac2

11. kernel bug report [2.4.20-pre7]

12. BUG() in e1000 driver (2.4.20-pre7)

13. comp bug in 2.4.20-rc1-ac2???