Panic from 2.4.19-pre9-aa2

Panic from 2.4.19-pre9-aa2

Post by Martin J. Blig » Sat, 08 Jun 2002 05:50:10



Panic below - crashed on third kernel compile since boot.
Worked fine on -pre8-aa2

M.

-------------------------------------------------------

Unable to handle kernel paging request at virtual address fffff85e
c648ff38
*pde = 00005063
Oops: 0000
CPU:    3
EIP:    0060:[<c648ff38>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: c648e000
eax: 00000000   ebx: c623a000   ecx: fffff83e   edx: c623a380
esi: 00000001   edi: c0297520   ebp: c0117bf6   esp: c648ff00
ds: 0018   es: 0018   ss: 0018
Process cpp (pid: 21583, stackpage=c648f000)
Stack: c648e000 c63473a0 c634740c 00000000 c01163f8 bfffeed4 c649e000 c648e000
       00000040 c648e000 00000002 c62b75e0 c4ad2f20 c648e000 c648ff60 c0147dad
       00001000 c4ba54e0 c63473a0 000415b4 00000000 c648e000 00000000 00000000
Call Trace: [<c01163f8>] [<c0147dad>] [<c0148180>] [<c013e308>] [<c013e937>]
   [<c0108a7b>]
Code: 60 ff 48 c6 ad 7d 14 c0 00 10 00 00 e0 54 ba c4 a0 73 34 c6

Quote:>>EIP; c648ff38 <END_OF_CODE+6196040/????>   <=====

Trace; c01163f8 <do_page_fault+0/670>
Trace; c0147dac <pipe_wait+7c/a4>
Trace; c0148180 <pipe_write+1cc/294>
Trace; c013e308 <filp_close+9c/a8>
Trace; c013e936 <sys_write+8e/100>
Trace; c0108a7a <system_call+2e/34>
Code;  c648ff38 <END_OF_CODE+6196040/????>
00000000 <_EIP>:
Code;  c648ff38 <END_OF_CODE+6196040/????>
   0:   60                        pusha  
Code;  c648ff38 <END_OF_CODE+6196040/????>   <=====
   1:   ff 48 c6                  decl   0xffffffc6(%eax)   <=====
Code;  c648ff3c <END_OF_CODE+6196044/????>
   4:   ad                        lods   %ds:(%esi),%eax
Code;  c648ff3c <END_OF_CODE+6196044/????>
   5:   7d 14                     jge    1b <_EIP+0x1b> c648ff52 <END_OF_CODE+61
9605a/????>
Code;  c648ff3e <END_OF_CODE+6196046/????>
   7:   c0 00 10                  rolb   $0x10,(%eax)
Code;  c648ff42 <END_OF_CODE+619604a/????>
   a:   00 00                     add    %al,(%eax)
Code;  c648ff44 <END_OF_CODE+619604c/????>
   c:   e0 54                     loopne 62 <_EIP+0x62> c648ff9a <END_OF_CODE+61
960a2/????>
Code;  c648ff46 <END_OF_CODE+619604e/????>
   e:   ba c4 a0 73 34            mov    $0x3473a0c4,%edx
Code;  c648ff4a <END_OF_CODE+6196052/????>
  13:   c6 00 00                  movb   $0x0,(%eax)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Panic from 2.4.19-pre9-aa2

Post by Andrea Arcangel » Sat, 08 Jun 2002 06:30:09



> Panic below - crashed on third kernel compile since boot.
> Worked fine on -pre8-aa2

> M.

> -------------------------------------------------------

> Unable to handle kernel paging request at virtual address fffff85e
> c648ff38
> *pde = 00005063
> Oops: 0000
> CPU:    3
> EIP:    0060:[<c648ff38>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: c648e000
> eax: 00000000   ebx: c623a000   ecx: fffff83e   edx: c623a380
> esi: 00000001   edi: c0297520   ebp: c0117bf6   esp: c648ff00
> ds: 0018   es: 0018   ss: 0018
> Process cpp (pid: 21583, stackpage=c648f000)
> Stack: c648e000 c63473a0 c634740c 00000000 c01163f8 bfffeed4 c649e000 c648e000
>        00000040 c648e000 00000002 c62b75e0 c4ad2f20 c648e000 c648ff60 c0147dad
>        00001000 c4ba54e0 c63473a0 000415b4 00000000 c648e000 00000000 00000000
> Call Trace: [<c01163f8>] [<c0147dad>] [<c0148180>] [<c013e308>] [<c013e937>]
>    [<c0108a7b>]
> Code: 60 ff 48 c6 ad 7d 14 c0 00 10 00 00 e0 54 ba c4 a0 73 34 c6

> >>EIP; c648ff38 <END_OF_CODE+6196040/????>   <=====
> Trace; c01163f8 <do_page_fault+0/670>
> Trace; c0147dac <pipe_wait+7c/a4>

ok, so the crash is at pipe_wait+7c. Can you disassemble pipe_wait?
(shouldn't be very big) (i use gcc 3.1.1 so my assembly wouldn't match)
apparently a part of the inode got corrupted, and somebody is reading at
offset 0x20 of a structure inside the inode.

not really sure what could be the problem, it would be interesting to
see if you can reproduce it. Also if for example you enabled numa-q you
may want to try to disable it and see if w/o discontigmem the problem
goes away, if we could isolate it to a config option, it would help a lot.

- Show quoted text -

Quote:> Trace; c0148180 <pipe_write+1cc/294>
> Trace; c013e308 <filp_close+9c/a8>
> Trace; c013e936 <sys_write+8e/100>
> Trace; c0108a7a <system_call+2e/34>
> Code;  c648ff38 <END_OF_CODE+6196040/????>
> 00000000 <_EIP>:
> Code;  c648ff38 <END_OF_CODE+6196040/????>
>    0:   60                        pusha  
> Code;  c648ff38 <END_OF_CODE+6196040/????>   <=====
>    1:   ff 48 c6                  decl   0xffffffc6(%eax)   <=====
> Code;  c648ff3c <END_OF_CODE+6196044/????>
>    4:   ad                        lods   %ds:(%esi),%eax
> Code;  c648ff3c <END_OF_CODE+6196044/????>
>    5:   7d 14                     jge    1b <_EIP+0x1b> c648ff52 <END_OF_CODE+61
> 9605a/????>
> Code;  c648ff3e <END_OF_CODE+6196046/????>
>    7:   c0 00 10                  rolb   $0x10,(%eax)
> Code;  c648ff42 <END_OF_CODE+619604a/????>
>    a:   00 00                     add    %al,(%eax)
> Code;  c648ff44 <END_OF_CODE+619604c/????>
>    c:   e0 54                     loopne 62 <_EIP+0x62> c648ff9a <END_OF_CODE+61
> 960a2/????>
> Code;  c648ff46 <END_OF_CODE+619604e/????>
>    e:   ba c4 a0 73 34            mov    $0x3473a0c4,%edx
> Code;  c648ff4a <END_OF_CODE+6196052/????>
>   13:   c6 00 00                  movb   $0x0,(%eax)

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Panic from 2.4.19-pre9-aa2

Post by Martin J. Blig » Sat, 08 Jun 2002 07:00:09


Quote:>> Unable to handle kernel paging request at virtual address fffff85e
>> c648ff38
>> *pde = 00005063
>> Oops: 0000
>> CPU:    3
>> EIP:    0060:[<c648ff38>]    Not tainted
>> Using defaults from ksymoops -t elf32-i386 -a i386
>> EFLAGS: c648e000
>> eax: 00000000   ebx: c623a000   ecx: fffff83e   edx: c623a380
>> esi: 00000001   edi: c0297520   ebp: c0117bf6   esp: c648ff00
>> ds: 0018   es: 0018   ss: 0018
>> Process cpp (pid: 21583, stackpage=c648f000)
>> Stack: c648e000 c63473a0 c634740c 00000000 c01163f8 bfffeed4 c649e000 c648e000
>>        00000040 c648e000 00000002 c62b75e0 c4ad2f20 c648e000 c648ff60 c0147dad
>>        00001000 c4ba54e0 c63473a0 000415b4 00000000 c648e000 00000000 00000000
>> Call Trace: [<c01163f8>] [<c0147dad>] [<c0148180>] [<c013e308>] [<c013e937>]
>>    [<c0108a7b>]
>> Code: 60 ff 48 c6 ad 7d 14 c0 00 10 00 00 e0 54 ba c4 a0 73 34 c6

>> >> EIP; c648ff38 <END_OF_CODE+6196040/????>   <=====
>> Trace; c01163f8 <do_page_fault+0/670>
>> Trace; c0147dac <pipe_wait+7c/a4>

> ok, so the crash is at pipe_wait+7c. Can you disassemble pipe_wait?
> (shouldn't be very big) (i use gcc 3.1.1 so my assembly wouldn't match)
> apparently a part of the inode got corrupted, and somebody is reading at
> offset 0x20 of a structure inside the inode.

(gdb) disassemble pipe_wait
Dump of assembler code for function pipe_wait:
0xc0147d30 <pipe_wait>: sub    $0x20,%esp
0xc0147d33 <pipe_wait+3>:       push   %ebp
0xc0147d34 <pipe_wait+4>:       push   %edi
0xc0147d35 <pipe_wait+5>:       push   %esi
0xc0147d36 <pipe_wait+6>:       push   %ebx
0xc0147d37 <pipe_wait+7>:       mov    $0xffffe000,%ebx
0xc0147d3c <pipe_wait+12>:      and    %esp,%ebx
0xc0147d3e <pipe_wait+14>:      lea    0x20(%esp,1),%ebp
0xc0147d42 <pipe_wait+18>:      mov    %ebp,%edx
0xc0147d44 <pipe_wait+20>:      mov    0x34(%esp,1),%esi
0xc0147d48 <pipe_wait+24>:      movl   $0x0,0x10(%esp,1)
0xc0147d50 <pipe_wait+32>:      movl   $0x0,0x14(%esp,1)
0xc0147d58 <pipe_wait+40>:      movl   $0x0,0x18(%esp,1)
0xc0147d60 <pipe_wait+48>:      movl   $0x0,0x1c(%esp,1)
0xc0147d68 <pipe_wait+56>:      mov    %ebx,0x14(%esp,1)
0xc0147d6c <pipe_wait+60>:      movl   $0x0,0x20(%esp,1)
0xc0147d74 <pipe_wait+68>:      mov    %ebx,0x24(%esp,1)
0xc0147d78 <pipe_wait+72>:      movl   $0x0,0x28(%esp,1)
0xc0147d80 <pipe_wait+80>:      movl   $0x0,0x2c(%esp,1)
0xc0147d88 <pipe_wait+88>:      movl   $0x1,(%ebx)
0xc0147d8e <pipe_wait+94>:      mov    0xf8(%esi),%eax
0xc0147d94 <pipe_wait+100>:     call   0xc01199c0 <add_wait_queue>
0xc0147d99 <pipe_wait+105>:     lea    0x6c(%esi),%edi
0xc0147d9c <pipe_wait+108>:     mov    %edi,%ecx
0xc0147d9e <pipe_wait+110>:     lock incl 0x6c(%esi)
0xc0147da2 <pipe_wait+114>:     jle    0xc014891b <.text.lock.pipe>
0xc0147da8 <pipe_wait+120>:     call   0xc0117ae8 <schedule>
0xc0147dad <pipe_wait+125>:     mov    0xf8(%esi),%eax
0xc0147db3 <pipe_wait+131>:     mov    %ebp,%edx
0xc0147db5 <pipe_wait+133>:     call   0xc0119a28 <remove_wait_queue>
0xc0147dba <pipe_wait+138>:     movl   $0x0,(%ebx)
0xc0147dc0 <pipe_wait+144>:     mov    %edi,%ecx
0xc0147dc2 <pipe_wait+146>:     lock decl 0x6c(%esi)
0xc0147dc6 <pipe_wait+150>:     js     0xc0148925 <.text.lock.pipe+10>
0xc0147dcc <pipe_wait+156>:     pop    %ebx
0xc0147dcd <pipe_wait+157>:     pop    %esi
0xc0147dce <pipe_wait+158>:     pop    %edi
0xc0147dcf <pipe_wait+159>:     pop    %ebp
0xc0147dd0 <pipe_wait+160>:     add    $0x20,%esp
0xc0147dd3 <pipe_wait+163>:     ret    
End of assembler dump.

Quote:> not really sure what could be the problem, it would be interesting to
> see if you can reproduce it. Also if for example you enabled numa-q you
> may want to try to disable it and see if w/o discontigmem the problem
> goes away, if we could isolate it to a config option, it would help a lot.

OK, I'll play around some more and try to build up a pattern.

Not sure why ksymoops is printing c0147dac from the trace, whilst
the stack says c0147dad, which seems to be the schedule call -
would make sense, as that's what you just changed?

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Panic from 2.4.19-pre9-aa2

Post by Andrea Arcangel » Sat, 08 Jun 2002 08:20:07


On Thu, Jun 06, 2002 at 02:53:40PM -0700, Martin J. Bligh wrote:
> >> Unable to handle kernel paging request at virtual address fffff85e
> >> c648ff38
> >> *pde = 00005063
> >> Oops: 0000
> >> CPU:    3
> >> EIP:    0060:[<c648ff38>]    Not tainted
> >> Using defaults from ksymoops -t elf32-i386 -a i386
> >> EFLAGS: c648e000
> >> eax: 00000000   ebx: c623a000   ecx: fffff83e   edx: c623a380
> >> esi: 00000001   edi: c0297520   ebp: c0117bf6   esp: c648ff00
> >> ds: 0018   es: 0018   ss: 0018
> >> Process cpp (pid: 21583, stackpage=c648f000)
> >> Stack: c648e000 c63473a0 c634740c 00000000 c01163f8 bfffeed4 c649e000 c648e000
> >>        00000040 c648e000 00000002 c62b75e0 c4ad2f20 c648e000 c648ff60 c0147dad
> >>        00001000 c4ba54e0 c63473a0 000415b4 00000000 c648e000 00000000 00000000
> >> Call Trace: [<c01163f8>] [<c0147dad>] [<c0148180>] [<c013e308>] [<c013e937>]
                                ^^^^^^^^
> >>    [<c0108a7b>]
> >> Code: 60 ff 48 c6 ad 7d 14 c0 00 10 00 00 e0 54 ba c4 a0 73 34 c6

> >> >> EIP; c648ff38 <END_OF_CODE+6196040/????>   <=====
> >> Trace; c01163f8 <do_page_fault+0/670>
> >> Trace; c0147dac <pipe_wait+7c/a4>

            ^^^^^^^^

> > ok, so the crash is at pipe_wait+7c. Can you disassemble pipe_wait?
> > (shouldn't be very big) (i use gcc 3.1.1 so my assembly wouldn't match)
> > apparently a part of the inode got corrupted, and somebody is reading at
> > offset 0x20 of a structure inside the inode.

> (gdb) disassemble pipe_wait
> Dump of assembler code for function pipe_wait:
> 0xc0147d30 <pipe_wait>: sub    $0x20,%esp

                                 ^^^^^ this should be 0x30!!!!!! not 0x20
> 0xc0147d33 <pipe_wait+3>:       push   %ebp
> 0xc0147d34 <pipe_wait+4>:       push   %edi
> 0xc0147d35 <pipe_wait+5>:       push   %esi
> 0xc0147d36 <pipe_wait+6>:       push   %ebx
> 0xc0147d37 <pipe_wait+7>:       mov    $0xffffe000,%ebx
> 0xc0147d3c <pipe_wait+12>:      and    %esp,%ebx
> 0xc0147d3e <pipe_wait+14>:      lea    0x20(%esp,1),%ebp
> 0xc0147d42 <pipe_wait+18>:      mov    %ebp,%edx
> 0xc0147d44 <pipe_wait+20>:      mov    0x34(%esp,1),%esi
> 0xc0147d48 <pipe_wait+24>:      movl   $0x0,0x10(%esp,1)
> 0xc0147d50 <pipe_wait+32>:      movl   $0x0,0x14(%esp,1)

                                  ^^^^^^^^^^^^^^^^^^^^^^^^ (what's this?)
> 0xc0147d58 <pipe_wait+40>:      movl   $0x0,0x18(%esp,1)
> 0xc0147d60 <pipe_wait+48>:      movl   $0x0,0x1c(%esp,1)
> 0xc0147d68 <pipe_wait+56>:      mov    %ebx,0x14(%esp,1)

                                  ^^^^^^^^^^^^^^^^^^^^^^^^
> 0xc0147d6c <pipe_wait+60>:      movl   $0x0,0x20(%esp,1)
> 0xc0147d74 <pipe_wait+68>:      mov    %ebx,0x24(%esp,1)
> 0xc0147d78 <pipe_wait+72>:      movl   $0x0,0x28(%esp,1)
> 0xc0147d80 <pipe_wait+80>:      movl   $0x0,0x2c(%esp,1)
> 0xc0147d88 <pipe_wait+88>:      movl   $0x1,(%ebx)
> 0xc0147d8e <pipe_wait+94>:      mov    0xf8(%esi),%eax
> 0xc0147d94 <pipe_wait+100>:     call   0xc01199c0 <add_wait_queue>
> 0xc0147d99 <pipe_wait+105>:     lea    0x6c(%esi),%edi
> 0xc0147d9c <pipe_wait+108>:     mov    %edi,%ecx
> 0xc0147d9e <pipe_wait+110>:     lock incl 0x6c(%esi)
> 0xc0147da2 <pipe_wait+114>:     jle    0xc014891b <.text.lock.pipe>
> 0xc0147da8 <pipe_wait+120>:     call   0xc0117ae8 <schedule>
> 0xc0147dad <pipe_wait+125>:     mov    0xf8(%esi),%eax

  ^^^^^^^^^^

At first glance this seems a miscompilation, a compiler bug, not bug in
2.4.19pre9aa2 (this clearly explains why you're the only one reproducing
this weird oops). it even sounds like ksymoops is buggy, ksymoops had to
say c0147dad (+7d), not c0147dac and +7c (maybe you compiled ksymoops
with the same compiler of the kernel? If not Keith should have a look
here).

besides the stupid zeroing of 0x14(esp) (my compiler isn't doing that),
the initial sub seems wrong, pipe_wait has just one argument, and that's
at offset 0x34, so it should be sub 0x30, not sub 0x20, or we will
corrupt the underlying stack and we also won't read the
inode at all (hence the oops, it wasn't the inode to be corrupted as I
guessed in the previous email, it's at the previous setp, we use random
memory as a pointer to the inode structure so we oops while we try to
read the inode contents).

Of course the code reads the inode at offset 0x34, but at 0x34 there's
not the inode, there's something else random, because the prologue did
sub 0x20 so the inode was at 0x24, not 0x34! the prologue clearly had to
do sub 0x30 instead (that's the miscompilation).

What compiler are you using? Maybe 2.96?

3.1.1 20020530 works fine for me with the kernel, as well as previous
gcc 3.1, never had a single problem with the kernel in the whole
developement cycle of 3.0 and 3.1 and now with 3.1.1. If you need to go
safe with the kernel for x86 you should use only 2.95 or egcs 1.1.2,
however I can reassure people that gcc 3.1.1 seems rock solid even if
I wouldn't use it in mission critical yet.

I CC'ed Honza (x86-64/x86 gcc guru) and Keith, in case I misread something.

Honza, this is the pipe_wait C code:

void pipe_wait(struct inode * inode)
{
        DECLARE_WAITQUEUE(wait, current);
        current->state = TASK_INTERRUPTIBLE;
        add_wait_queue(PIPE_WAIT(*inode), &wait);
                       ^^^^^^^^^^^^^^^^^ we bug here while dereferencing inode->i_pipe
        up(PIPE_SEM(*inode));
        schedule();
        remove_wait_queue(PIPE_WAIT(*inode), &wait);
        current->state = TASK_RUNNING;
        down(PIPE_SEM(*inode));

}

note, wait is at offset 0 of i_pipe, and i_pipe is at offset 0xf8 of the
inode. So it is indeed doing inode->i_pipe when it oops, because the
inode address passed on the stack (first and only argument) was at 0x24 not 0x34.

- Show quoted text -

> 0xc0147db3 <pipe_wait+131>:     mov    %ebp,%edx
> 0xc0147db5 <pipe_wait+133>:     call   0xc0119a28 <remove_wait_queue>
> 0xc0147dba <pipe_wait+138>:     movl   $0x0,(%ebx)
> 0xc0147dc0 <pipe_wait+144>:     mov    %edi,%ecx
> 0xc0147dc2 <pipe_wait+146>:     lock decl 0x6c(%esi)
> 0xc0147dc6 <pipe_wait+150>:     js     0xc0148925 <.text.lock.pipe+10>
> 0xc0147dcc <pipe_wait+156>:     pop    %ebx
> 0xc0147dcd <pipe_wait+157>:     pop    %esi
> 0xc0147dce <pipe_wait+158>:     pop    %edi
> 0xc0147dcf <pipe_wait+159>:     pop    %ebp
> 0xc0147dd0 <pipe_wait+160>:     add    $0x20,%esp
> 0xc0147dd3 <pipe_wait+163>:     ret    
> End of assembler dump.

> > not really sure what could be the problem, it would be interesting to
> > see if you can reproduce it. Also if for example you enabled numa-q you
> > may want to try to disable it and see if w/o discontigmem the problem
> > goes away, if we could isolate it to a config option, it would help a lot.

> OK, I'll play around some more and try to build up a pattern.

> Not sure why ksymoops is printing c0147dac from the trace, whilst
> the stack says c0147dad, which seems to be the schedule call -
> would make sense, as that's what you just changed?

yes, that's wrong, but that is a ksymoops mistake not related to the
original oops (possibly due the same broken compiler but maybe not).

> M.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

Panic from 2.4.19-pre9-aa2

Post by Martin J. Blig » Sat, 08 Jun 2002 08:30:08


Quote:> not really sure what could be the problem, it would be interesting to
> see if you can reproduce it.

Yup, do 2 or 3 kernel compiles and it crashes again. Here's a slightly
different oops:

Unable to handle kernel NULL pointer dereference at virtual address 00000282
c0117feb
*pde = 00000000
Oops: 0000
CPU:    6
EIP:    0010:[<c0117feb>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010046
eax: c6369f6c   ebx: 00000282   ecx: c029a488   edx: c4ff5b24
esi: c4ff5b20   edi: 00000282   ebp: c6227f70   esp: c6227f54
ds: 0018   es: 0018   ss: 0018
Process cpp (pid: 16679, stackpage=c6227000)
Stack: 00001000 c4ff5b20 c5773180 00000001 c4ff5b24 00000282 00000001 000526a9
       c0148311 00000000 ffffffea c5eab160 000536a9 c6526000 c6226000 c57731ec
       00001000 00001000 c013ead7 c5eab160 4011000c 000536a9 c5eab180 c6226000
Call Trace: [<c0148311>] [<c013ead7>] [<c0108a7b>]
Code: 8b 3b 0f 18 07 3b 5d f4 75 d0 c6 06 01 ff 75 f8 9d 8d 74 26

Quote:>>EIP; c0117fea <__wake_up+5a/7c>   <=====

Trace; c0148310 <pipe_write+1bc/294>
Trace; c013ead6 <sys_write+8e/100>
Trace; c0108a7a <system_call+2e/34>
Code;  c0117fea <__wake_up+5a/7c>
00000000 <_EIP>:
Code;  c0117fea <__wake_up+5a/7c>   <=====
   0:   8b 3b                     mov    (%ebx),%edi   <=====
Code;  c0117fec <__wake_up+5c/7c>
   2:   0f 18 07                  prefetchnta (%edi)
Code;  c0117fee <__wake_up+5e/7c>
   5:   3b 5d f4                  cmp    0xfffffff4(%ebp),%ebx
Code;  c0117ff2 <__wake_up+62/7c>
   8:   75 d0                     jne    ffffffda <_EIP+0xffffffda> c0117fc4 <__
wake_up+34/7c>
Code;  c0117ff4 <__wake_up+64/7c>
   a:   c6 06 01                  movb   $0x1,(%esi)
Code;  c0117ff6 <__wake_up+66/7c>
   d:   ff 75 f8                  pushl  0xfffffff8(%ebp)
Code;  c0117ffa <__wake_up+6a/7c>
  10:   9d                        popf  
Code;  c0117ffa <__wake_up+6a/7c>
  11:   8d 74 26 00               lea    0x0(%esi,1),%esi

Quote:> Also if for example you enabled numa-q you
> may want to try to disable it and see if w/o discontigmem the problem
> goes away, if we could isolate it to a config option, it would help a lot.

OK, will see if I can do that - I'm out for a few days, so it may be next
Tuesday before I can do this

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Panic from 2.4.19-pre9-aa2

Post by Hugh Dickin » Sat, 08 Jun 2002 08:40:04



> Not sure why ksymoops is printing c0147dac from the trace, whilst
> the stack says c0147dad, which seems to be the schedule call -

Bug in ksymoops (had a misinitialized truncate_mask, which
removed the low bit by mistake): fixed in ksymoops 2.4.4.

Hugh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Panic from 2.4.19-pre9-aa2

Post by Andrea Arcangel » Sat, 08 Jun 2002 08:40:07



> > not really sure what could be the problem, it would be interesting to
> > see if you can reproduce it.

> Yup, do 2 or 3 kernel compiles and it crashes again. Here's a slightly
> different oops:

> Unable to handle kernel NULL pointer dereference at virtual address 00000282
> c0117feb
> *pde = 00000000
> Oops: 0000
> CPU:    6
> EIP:    0010:[<c0117feb>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010046
> eax: c6369f6c   ebx: 00000282   ecx: c029a488   edx: c4ff5b24
> esi: c4ff5b20   edi: 00000282   ebp: c6227f70   esp: c6227f54
> ds: 0018   es: 0018   ss: 0018
> Process cpp (pid: 16679, stackpage=c6227000)
> Stack: 00001000 c4ff5b20 c5773180 00000001 c4ff5b24 00000282 00000001 000526a9
>        c0148311 00000000 ffffffea c5eab160 000536a9 c6526000 c6226000 c57731ec
>        00001000 00001000 c013ead7 c5eab160 4011000c 000536a9 c5eab180 c6226000
> Call Trace: [<c0148311>] [<c013ead7>] [<c0108a7b>]
> Code: 8b 3b 0f 18 07 3b 5d f4 75 d0 c6 06 01 ff 75 f8 9d 8d 74 26

> >>EIP; c0117fea <__wake_up+5a/7c>   <=====
> Trace; c0148310 <pipe_write+1bc/294>

no doubt it crashes again here, the pipe_write stack gets corrupted by
pipe_wait. Actually we had very good luck that previously it crashed in
the buggy place, so you showed me imemdiatly the buggy assembler, if it
crashed in __wake_up the first time, maybe __wake_up wasn't miscompiled
and it would been much harder to guess it was not a kernel mistake... :)

- Show quoted text -

Quote:> Trace; c013ead6 <sys_write+8e/100>
> Trace; c0108a7a <system_call+2e/34>
> Code;  c0117fea <__wake_up+5a/7c>
> 00000000 <_EIP>:
> Code;  c0117fea <__wake_up+5a/7c>   <=====
>    0:   8b 3b                     mov    (%ebx),%edi   <=====
> Code;  c0117fec <__wake_up+5c/7c>
>    2:   0f 18 07                  prefetchnta (%edi)
> Code;  c0117fee <__wake_up+5e/7c>
>    5:   3b 5d f4                  cmp    0xfffffff4(%ebp),%ebx
> Code;  c0117ff2 <__wake_up+62/7c>
>    8:   75 d0                     jne    ffffffda <_EIP+0xffffffda> c0117fc4 <__
> wake_up+34/7c>
> Code;  c0117ff4 <__wake_up+64/7c>
>    a:   c6 06 01                  movb   $0x1,(%esi)
> Code;  c0117ff6 <__wake_up+66/7c>
>    d:   ff 75 f8                  pushl  0xfffffff8(%ebp)
> Code;  c0117ffa <__wake_up+6a/7c>
>   10:   9d                        popf  
> Code;  c0117ffa <__wake_up+6a/7c>
>   11:   8d 74 26 00               lea    0x0(%esi,1),%esi

> > Also if for example you enabled numa-q you
> > may want to try to disable it and see if w/o discontigmem the problem
> > goes away, if we could isolate it to a config option, it would help a lot.

> OK, will see if I can do that - I'm out for a few days, so it may be next
> Tuesday before I can do this

> M.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 
 
 

Panic from 2.4.19-pre9-aa2

Post by Martin J. Blig » Sat, 08 Jun 2002 08:50:04


Quote:> At first glance this seems a miscompilation, a compiler bug, not bug in
> 2.4.19pre9aa2 (this clearly explains why you're the only one reproducing
> this weird oops). it even sounds like ksymoops is buggy, ksymoops had to
> say c0147dad (+7d), not c0147dac and +7c (maybe you compiled ksymoops
> with the same compiler of the kernel? If not Keith should have a look
> here).

> What compiler are you using? Maybe 2.96?

Errm ....  Redhat 6.2 default ... egcs-2.91.66 .... time to upgrade ?? ;-) ;-)
Pah ... reinstalling these machines is a pain in the ass .... ;-)

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Panic from 2.4.19-pre9-aa2

Post by Andrea Arcangel » Sat, 08 Jun 2002 09:00:09



> > At first glance this seems a miscompilation, a compiler bug, not bug in
> > 2.4.19pre9aa2 (this clearly explains why you're the only one reproducing
> > this weird oops). it even sounds like ksymoops is buggy, ksymoops had to
> > say c0147dad (+7d), not c0147dac and +7c (maybe you compiled ksymoops
> > with the same compiler of the kernel? If not Keith should have a look
> > here).

> > What compiler are you using? Maybe 2.96?

> Errm ....  Redhat 6.2 default ... egcs-2.91.66 .... time to upgrade ?? ;-) ;-)

hmm, that's a bad news, that's egcs 1.1.2, strange, it was supposed to
be safe oh well, but OTOH I'm not too surprised nobody noticed because I
doubt many people compiles with 2.4 with egcs still.

Quote:> Pah ... reinstalling these machines is a pain in the ass .... ;-)

Could you try compiling in another machine with a gcc 2.95 and see if
you can still reproduce it? If it's a race condition and a real kernel
bug it should be easily reproducible no matter the compiler.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Panic from 2.4.19-pre9-aa2

Post by Keith Owen » Sat, 08 Jun 2002 10:40:07


On Thu, 06 Jun 2002 14:53:40 -0700,

Quote:>Not sure why ksymoops is printing c0147dac from the trace, whilst
>the stack says c0147dad, which seems to be the schedule call -
>would make sense, as that's what you just changed?

Truncate mask bug, fixed in ksymoops 2.4.4.  Current is 2.4.5.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. kbuild25 version 3.0 for 2.4.19-pre9 and 2.4.19-pre9-ac3

kbuild.sf.net have kbuild25 version 3.0 released yesterday.

Also I fixed some silly bugs I've left in Makefile.in. I'm really happy that
people ARE interested in what I'm doing and provided feedback.

And filename is getting longer :)
http://stingr.net/l/kbuild25-3.0-for-2.4.19-pre9.bz2
http://stingr.net/l/kbuild25-3.0-for-2.4.19-pre9-ac3.bz2

-or-
ftp://stingr.net/pub/l/kbuild25-3.0-for-2.4.19-pre9.bz2
ftp://stingr.net/pub/l/kbuild25-3.0-for-2.4.19-pre9-ac3.bz2

--
Paul P 'Stingray' Komkoff 'Greatest' Jr /// (icq)23200764 /// (http)stingr.net
  When you're invisible, the only one really watching you is you (my keychain)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. Slow slow FTP

3. 2.4.19-pre9 kernel panic during ppp disconnect/reconnect (tulip card, DSL, pppoe)?

4. Q: bindkey in tcsh -- bug?

5. 2.4.19-pre9 kernel panic: kfree_skb passed skb stillon list

6. confusion - RH 7.1 switching from ipchains to iptables

7. Kernel panic 2.4.19-pre6 AND 2.4.19-pre5-ac3 - More info - ksymoops

8. Help Assembling Code...

9. Linux 2.4.19-pre9-jam1

10. 2.4.19-pre9 Oops in find_inode()

11. 2.4.19-pre9-ac3 still OOPS when exiting X with i810 chipset

12. nfs problem 2.4.19-pre9

13. kbuild-2.5 for 2.4.19-pre9-ac{2,3}