kernel 2.4.17 crashes on SCSI-errors

kernel 2.4.17 crashes on SCSI-errors

Post by R.Oeh.. » Fri, 04 Jan 2002 21:10:11



Hi, List

right now I tried the new kernel 2.4.17, hoping, that
the SCSI-system is now useable again.
But NO! It immediately crashed, like the few kernels before.

In the meantime I'm really getting into problems with
our product, because I expect SuSE to launch their next
release soon with an instable "stable" kernel.

Isn't anybody recognizing, that this bug is serious?
3.5" MO-drives report blank sectors as "SCSI-Hardware-Error"
This kind of sense code also appears for errors, that
are much more common than blanked sectors.
Any flaw in SCSI-disks will crash the kernel.
Please don't rely on modern hardware to be so perfect, that
errors will never occure. Then you could likewise remove
the complete error-handling-code.
This would at least prevent the crashes...

Here is a simple procedure to reliably trigger the BUG:

1) I compiled the SCSI-stuff as modules.
2) I put an erased MO-Medium in a MO-SCSI-drive.
3) I connected the drive to the computer.
4) I typed "modprobe sd_mod"
5) Crash! Serial console said:

Welcome to SuSE Linux 7.3 (i386) - Kernel 2.4.17 (ttyS0).

tick login: invalid operand: 0000
CPU:    0
EIP:    0010:[<d0851735>]    Not tainted
EFLAGS: 00010082
eax: 00000042   ebx: ce3dc070   ecx: c0224080   edx: 0000270d
esi: c009e018   edi: 00000018   ebp: c009e000   esp: c0237dd4
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0237000)
Stack: d0867340 00000093 cf95b9ac cfb6de00 c0237e2c 00000000 66656400 00000006
       cfb6de10 00000002 00000003 00000282 41000031 c0220002 ce434a00 d0851346
       cfb6de00 ce468ecc 00000293 ce434ab8 ce434a00 cf4f416c 00000092 d083466a
Call Trace: [<d0867340>] [<d0851346>] [<d083466a>] [<d0834df8>] [<d083baaf>]
   [<d084e880>] [<d083b10e>] [<d083b2b3>] [<d083b318>] [<d083b7a0>] [<d084cce8>]
   [<d08351f7>] [<d0835099>] [<c01176a2>] [<c01175d9>] [<c01173ca>] [<c0107f8d>]
   [<c0105150>] [<c0105150>] [<c0105173>] [<c01051d7>] [<c0105000>] [<c0105027>]

Code: 0f 0b 83 c4 08 83 3e 00 74 13 8b 06 05 00 00 00 40 89 46 0c
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

Again I offer my time and my hardware for testing purposes.
I cannot fix the bug in the kernel myself, but I can test patches
and provide resulting stack traces.

Regards,
        Ralf

 -----------------------------------------------------------------
|  Ralf Oehler
|  GDI - Gesellschaft fuer Digitale Informationstechnik mbH
|

|  Tel.:        +49 6182-9271-23
|  Fax.:        +49 6182-25035          
|  Mail:        GDI, Bensbruchstra?e 11, D-63533 Mainhausen
|  HTTP:        www.GDImbH.com
 -----------------------------------------------------------------

time is a funny concept

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

kernel 2.4.17 crashes on SCSI-errors

Post by Jens Axbo » Fri, 04 Jan 2002 21:20:08



> tick login: invalid operand: 0000
> CPU:    0
> EIP:    0010:[<d0851735>]    Not tainted
> EFLAGS: 00010082
> eax: 00000042   ebx: ce3dc070   ecx: c0224080   edx: 0000270d
> esi: c009e018   edi: 00000018   ebp: c009e000   esp: c0237dd4
> ds: 0018   es: 0018   ss: 0018
> Process swapper (pid: 0, stackpage=c0237000)
> Stack: d0867340 00000093 cf95b9ac cfb6de00 c0237e2c 00000000 66656400 00000006
>        cfb6de10 00000002 00000003 00000282 41000031 c0220002 ce434a00 d0851346
>        cfb6de00 ce468ecc 00000293 ce434ab8 ce434a00 cf4f416c 00000092 d083466a
> Call Trace: [<d0867340>] [<d0851346>] [<d083466a>] [<d0834df8>] [<d083baaf>]
>    [<d084e880>] [<d083b10e>] [<d083b2b3>] [<d083b318>] [<d083b7a0>] [<d084cce8>]
>    [<d08351f7>] [<d0835099>] [<c01176a2>] [<c01175d9>] [<c01173ca>] [<c0107f8d>]
>    [<c0105150>] [<c0105150>] [<c0105173>] [<c01051d7>] [<c0105000>] [<c0105027>]

> Code: 0f 0b 83 c4 08 83 3e 00 74 13 8b 06 05 00 00 00 40 89 46 0c
>  <0>Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing

Please ksymoops this oops.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

kernel 2.4.17 crashes on SCSI-errors

Post by Alan Co » Fri, 04 Jan 2002 21:40:09


Quote:> Isn't anybody recognizing, that this bug is serious?

My 2.4.9-ac kernel tree here seems to be behaving

Quote:> 1) I compiled the SCSI-stuff as modules.
> 2) I put an erased MO-Medium in a MO-SCSI-drive.

[erased and formatted I assume ?]

Quote:> 3) I connected the drive to the computer.
> 4) I typed "modprobe sd_mod"
> 5) Crash! Serial console said:

> tick login: invalid operand: 0000

BUG trap. Turn on verbose bug reporting, also run the  oops you then
get through ksymoops so that its actually readable by others. List what
scsi controller you use too.

The RH tree I'm running backed out a couple of scsi error handling changes
because we saw strange deadlocks. I don't think those are in Marcelo's tree
because I never had time to work out why they had to be reverted

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

kernel 2.4.17 crashes on SCSI-errors

Post by Jens Axbo » Fri, 04 Jan 2002 21:40:11


seeing an older post on linux-scsi, you might want to retry your test
with the aic7xxx nseg bug fixed. this is against 2.4.17, haven't checked
if it's applied in 2.4.18-pre1 -- if not, Marcelo please apply.

--- drivers/scsi/aic7xxx/aic7xxx_linux.c~       Thu Jan  3 13:32:33 2002

                               cmd->request_buffer,
                               cmd->request_bufflen,
                               scsi_to_pci_dma_dir(cmd->sc_data_direction));
+                       scb->sg_count = 0;
                        scb->sg_count = ahc_linux_map_seg(ahc, scb,
                                                          sg, addr,
                                                          cmd->request_bufflen);

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

kernel 2.4.17 crashes on SCSI-errors

Post by Jens Axbo » Sat, 05 Jan 2002 17:50:10




> > On Thu, 03 Jan 2002 14:39:02 +0100 (MET),

> >>Ksymoops was not possible, because after rebooting the
> >>memory/module-layout had changed. (Or is there a trick
> >>I don't know?)

> > /var/log/ksymoops.  man insmod, look for ksymoops assistance.

> Thanks a lot, I'll try it for the next crash.
> But for now, I think, the output of the SGI de* I sent
> to the list shows the same.

> kernel BUG at /usr/src/linux-2.4.17-Dbg/include/asm/pci.h:147!
> from [aic7xxx]ahc_linux_run_device_queue+0x39d

aic7xxx is calling pci_map_sg on either an unitialized scatterlist, or
maybe just specifying too many segments. try and add a printk to print
'i' before the BUG() at line 147 in include/asm-i386/pci.h

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://www.veryComputer.com/
Please read the FAQ at  http://www.veryComputer.com/

 
 
 

1. some 2.4.17 vs. 2.4.17-rmap8 vs. lowmem analysis


Yes, the rmap patch still has a known livelock. I haven't
quite tracked it down yet, but am looking into it whenever
I have the time.

regards,

Rik
--
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/             http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. Howto get cd-rom to support long file-names

3. Oops/Crash with 2.4.17 and 2.4.18 kernels

4. Q: P6 and dual P6 for Linux?

5. kernel crashes with 2.4.17

6. Help - I have no nameserver!

7. Kernel 2.4.17 with VT8367 [KT266] crashes on heavy ide load togeter

8. Configuring /dev/bpf0

9. 2.4.17 ide-scsi errors

10. 2.4.17/18 kernel compile error

11. Kernel 2.4.17 - > "Checking root filesystem" error

12. Still getting the same IPTABLES errors in the kernel(2.4.17)

13. scsi0: PCI error Interrupt at seqaddr = 0x9 --> 2.4.17 AIC7XXX -- parity error