IDE status errors (IDE/SCSI conflict or buggy ide driver?)

IDE status errors (IDE/SCSI conflict or buggy ide driver?)

Post by Wayne Bradn » Mon, 25 Mar 2002 08:31:02



Summary:

I've recently added a (brand new) Maxtor ATA/100 card (Promise
Ultra100) and a (brand new) 100GB Western Digital ATA/100 drive to my
system. I ran e2fsck to find bad blocks - none.
I immediately began to see "status errors" on ide2, so I did a little
detective work, below, to try and get my head around the issue.
On swapping out the 100GB drive for a (less than 1-year old) Quantum
30GB ATA/66 drive, fearing that my new WD drive was DOA, I managed to
get different errors -- see below).
I have a SCSI drive and controller in there that have worked just fine
for about a year now.

--------------------------------------------------------------
Exhibit 1: (/proc/interrupts)

           CPU0       CPU1      
  0:     181978     203371    IO-APIC-edge  timer
  1:       1102       1338    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  3:          3          3    IO-APIC-edge  serial
  5:         25         23   IO-APIC-level  AM53C974
  8:          0          1    IO-APIC-edge  rtc
 10:    6970093    6969779   IO-APIC-level  ide2, sym53c8xx
 11:        239        207   IO-APIC-level  eth0
 12:       1140       1034    IO-APIC-edge  PS/2 Mouse
 14:       2115       2210    IO-APIC-edge  ide0
 15:       9890      10936    IO-APIC-edge  ide1
NMI:          0          0
LOC:     385273     385272
ERR:          0
MIS:         30

-------------------------------------------------------------
Exhibit 2: (from /proc/pci)

PCI devices found:
  Bus  0, device   7, function  1:
    IDE interface: Intel Corp. 82371AB PIIX4 IDE (rev 1).
      Master Capable.  Latency=64.  
      I/O at 0xffa0 [0xffaf].
  Bus  0, device  13, function  0:
    SCSI storage controller: LSI Logic / Symbios Logic (formerly NCR)
53c895 (rev 1).
      IRQ 10.
      Master Capable.  Latency=64.  Min Gnt=30.Max Lat=64.
      I/O at 0xe800 [0xe8ff].
      Non-prefetchable 32 bit memory at 0xfebfff00 [0xfebfffff].
      Non-prefetchable 32 bit memory at 0xfebfe000 [0xfebfefff].
  Bus  0, device  18, function  0:
    Unknown mass storage controller: Promise Technology, Inc. 20267
(rev 2).
      IRQ 10.
      Master Capable.  Latency=64.  
      I/O at 0xeff0 [0xeff7].
      I/O at 0xefe4 [0xefe7].
      I/O at 0xefa8 [0xefaf].
      I/O at 0xefe0 [0xefe3].
      I/O at 0xef00 [0xef3f].
      Non-prefetchable 32 bit memory at 0xfebc0000 [0xfebdffff].

---------------------------------------------------------------
Exhibit 3: (from /var/log/messages)

[This is what happens when I try to copy ~6GB of data from /dev/sda1
to the WD drive (as /dev/hde1)]
command: find . | cpio -mpu /newdrive

Mar 23 15:57:02 gromit kernel: hde: status error: status=0x58 {
DriveReady SeekComplete DataRequest }
Mar 23 15:57:02 gromit kernel: hde: drive not ready for command
Mar 23 15:57:02 gromit kernel: hde: status timeout: status=0xd0 { Busy

Quote:}

Mar 23 15:57:02 gromit kernel: hde: drive not ready for command
Mar 23 15:57:02 gromit kernel: ide2: reset: success
Mar 23 15:59:12 gromit kernel: hde: status error: status=0x58 {
DriveReady SeekComplete DataRequest }
Mar 23 15:59:12 gromit kernel: hde: drive not ready for command
Mar 23 15:59:12 gromit kernel: hde: status timeout: status=0xd0 { Busy
Quote:}

Mar 23 15:59:12 gromit kernel: hde: drive not ready for command
Mar 23 15:59:12 gromit kernel: ide2: reset: success
Mar 23 16:07:44 gromit kernel: hde: status error: status=0x58 {
DriveReady SeekComplete DataRequest }
Mar 23 16:07:44 gromit kernel: hde: drive not ready for command
Mar 23 16:07:44 gromit kernel: hde: status timeout: status=0xd0 { Busy
Quote:}

Mar 23 16:07:44 gromit kernel: hde: drive not ready for command
Mar 23 16:07:44 gromit kernel: ide2: reset: success

I've repeated this test about six times, and each time there have been
exactly three resets and they come at about the same time during the
copy (about a minute in).

Other than these errors, the data _appears_ to copy over intact.

-------------------------------------------------------------------
Exhibit 4: (from /var/log/messages)

[This is what happens when I try to copy ~6GB of data from /dev/sda1
to the Quantum drive (as /dev/hde1)]
command: find . | cpio -mpu /newdrive

Mar 23 16:37:09 gromit kernel: hde: irq timeout: status=0xd0 { Busy }
Mar 23 16:37:09 gromit kernel: ide2: reset: success
Mar 23 16:47:42 gromit kernel: hde: irq timeout: status=0xd0 { Busy }
Mar 23 16:47:42 gromit kernel: ide2: reset: success
Mar 23 16:57:37 gromit kernel: hde: irq timeout: status=0xd0 { Busy }
Mar 23 16:57:37 gromit kernel: ide2: reset: success

I've repeated this test about six times, and each time there have been
exactly three resets and they come at about the same time during the
copy (about a minute in).

Other than these errors, the data _appears_ to copy over intact.

------------------------------------------------------------------
Exhibit 5: (testing copy IDE1 -> IDE2)

When I try to copy ~4GB of data from /dev/hdc1 to the WD drive (as
/dev/hde1), I get no errors in the log, and the data appears to copy
over intact.

------------------------------------------------------------------
Exhibit 6: (testing copy IDE1 -> IDE2)

When I try to copy ~4GB of data from /dev/hdc1 to the Quantum drive
(as /dev/hde1), I get no errors in the log, and the data appears to
copy over intact.

------------------------------------------------------------------
Observations:

1. The new Ultra100 takes the same IRQ (10) as the SCSI host adapter.
2. The errors only occur when copying a large amount of data from the
SCSI drive to the ATA/100 drive, and not when copying from an ATA/33
drive to the ATA/100 drive.
3. The errors all seem to be very predictable and repeatable.

-------------------------------------------------------------------
Conclusions:

1. Both the new ATA/100 WD and the ATA/66 Quantum are fine (likely?)
OR both of them are identically broken (unlikely?).
2. The Ultra100 is broken (possible) OR it doesn't play nice when on
the same interrupt as another controller (possible). (Can the IRQ be
manually assigned easily?)
3. The linux ide driver is buggy (likely?)

Are there any hard drive experts out there who can help me out here?
Since the error is so repeatable, I'm happy to run any more detailed
tests for the kernel/driver guys if it helps fix this...

Thanks in advance,
WMB

 
 
 

IDE status errors (IDE/SCSI conflict or buggy ide driver?)

Post by Wayne Bradn » Mon, 25 Mar 2002 11:19:15


OK, in answer to my own question, it seems that the shared IRQ between
the Promise and SCSI adapters _is_ a problem. After moving the Promise
to another PCI slot, I've been able to perform the same copy twice now
without any errors, to the original 100MB disk. Let's hope this is a
permanent fix.

Does anyone have any idea if sharing and IRQ between IDE and SCSI
hosts is a certain no-no?

 
 
 

IDE status errors (IDE/SCSI conflict or buggy ide driver?)

Post by M. Buchenried » Mon, 25 Mar 2002 18:12:56


[...]

Quote:> 10:    6970093    6969779   IO-APIC-level  ide2, sym53c8xx

[...]

This is most likely the answer. Both the SCSI card and the Promise
card are sharing the same IRQ. While this works, it is not a very
good idea to put two (or more) fast IRQ devices on the same
IRQ line (especially if one of these is a Promise controller).
The Promise card is very lousy as far as IRQ sharing is concerned,
and it is generally recommended to put it on an IRQ of its own.
Note that this applies to all operating systems; it's not
limited to Linux (or UN*X) at all.

Michael
--

          Lumber Cartel Unit #456 (TINLC) & Official Netscum
    Note: If you want me to send you email, don't munge your address.

 
 
 

IDE status errors (IDE/SCSI conflict or buggy ide driver?)

Post by Wayne Bradn » Tue, 26 Mar 2002 00:30:14


Thanks, Michael. I figured that might be the case. Now I have the
Promise all on its own at IRQ 5, and the two SCSI hosts (both
Tekram/Symbios) share IRQ 10. With this setup, I've been able to
crunch all this at once, without missing a beat:

- Copy 6GB from the HD on the Ultra/Wide SCSI bus to the HD on the
Ultra100
- Copy 600MB from a CDROM on the Fast SCSI bus to the HD on the
Ultra100
- Copy 4GB from a HD on the internal ATA/33 bus to the HD on the
Ultra100

So I'm pretty happy at this point, and I'll go ahead and start filling
up that 100GB monster! I guess the two Tekram controllers don't mind
sharing an IRQ.

Regards,
WMB



> [...]

> > 10:    6970093    6969779   IO-APIC-level  ide2, sym53c8xx

> [...]

> This is most likely the answer. Both the SCSI card and the Promise
> card are sharing the same IRQ. While this works, it is not a very
> good idea to put two (or more) fast IRQ devices on the same
> IRQ line (especially if one of these is a Promise controller).
> The Promise card is very lousy as far as IRQ sharing is concerned,
> and it is generally recommended to put it on an IRQ of its own.
> Note that this applies to all operating systems; it's not
> limited to Linux (or UN*X) at all.

> Michael

 
 
 

1. e-ide driver 6.30 uses same interrupt on IDE-0 and IDE-1

I have a 10 GB SeaGate disk that works fine under Windows 95, not with Linux.

With W95, the Control Panel + Device Drivers + Hard Disk controllers says
it is using "standard double driver PCI-IDE", and its resources are

   I/O address --- 1F0-1F7 --- 170-177
   IRQ         ---      14 ---      15

With Linux, when Caldera eDesktop 2.4 starts, it logs

  Uniform multi-platform e-ide driver revision 6.30
    IDE-0 at 0x1F0-0x1F7 on IRQ 14.
    IDE-1 at 0x170-0x177 on IRQ 14.

I guess that "IRQ 14" ( two times ) makes this disk to be not reachable
( any access to this disk ends up with "lost interrupt" ).

How can I tell that "e-ide" driver to use IRQ 15 on IDE-1 ?

Thanks a lot. Sebastian.

2. kconfig update

3. 2 pcmcia ethernet cards with redhat. How?

4. 2.5.1-pre2 compile error in ide-scsi.o ide-scsi.c

5. Word Processor for Linux

6. disabling ide interface on my sb16pnp : conflict with integrated IDE : kernel message : lost interrupt

7. Fix alpha NR_SYSCALLS

8. IDE & SCSI (IDE vs. SCSI)?

9. Conflicting semantics in ide and scsi tape drivers.

10. PATCH: support for IDE devices in ide-scsi with devfs

11. IDE CDRW and ide-scsi fun?

12. ide-cd and ide-scsi modules