SCSI bus resets w/Linux 2.4.x and Mylex DAC960 RAID

SCSI bus resets w/Linux 2.4.x and Mylex DAC960 RAID

Post by thatseattle.. » Tue, 05 Jul 2005 04:15:37



A client has a 3-disk SCSI RAID array attached to an old Mylex DAC960
controller. About six months ago, they lost one of the disks, which was
fine, except there was no alarm and they didn't notice... Last
Wednesday at 9:03AM, they lost a second disk - and believe me, *that*
got their attention. :}

They replaced the two failed drives (older 10K IBM models) with newer
Seagates in the same drive caddies, leaving the remaining working IBM
drive in place (don't ask...). All drives are 18GB SCSI. After the
array was rebuild and yours truly resurrected things from backup tapes,
we noticed that the system was slow and there were huge "gaps" in
activity where for 5-8 seconds seemingly nothing was getting processed
and no disk activity was happening. Long story short we found that the
SCSI bus was undergoing a full reset every minute or so (see dump
below).

So here's the question: assuming termination and cables are all fine
(as the old drive caddies were reused in their same positions, and they
look OK), could the fact that we've got two different types of drives
in the same array be causing the resets? I've heard tale tell the
DAC960 is pretty finicky  about the types of drives it likes to talk
to. And I've never tried to do hardware RAID with dissimilar drives (of
different generations, no less)  before.

Thoughts/ideas welcome.

-jr-

$ more /proc/rd/c0/current_status
***** DAC960 RAID Driver Version 2.4.11 of 11 October 2001 *****
Configuring Mylex DAC960PG PCI RAID Controller
  Firmware Version: 4.06-0-08, Channels: 2, Memory Size: 8MB
  PCI Bus: 0, Device: 11, Function: 1, I/O Address: Unassigned
  PCI Address: 0xF4104000 mapped at 0xD0800000, IRQ Channel: 18
  Controller Queue Depth: 64, Maximum Blocks per Command: 128
  Driver Queue Depth: 63, Scatter/Gather Limit: 33 of 33 Segments
  Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
  Physical Devices:
    0:0  Vendor: SEAGATE   Model: ST318406LW        Revision: 0108
         Serial Number: 3FE0HNRH00007222F4Q5
         Disk Status: Online, 35842048 blocks, 730 resets
    0:1  Vendor: IBM       Model: DDYS-T18350N      Revision: S80D
         Serial Number:         VEL7C714
         Disk Status: Online, 35842048 blocks, 730 resets
    0:2  Vendor: SEAGATE   Model: ST318406LW        Revision: 0108
         Serial Number: 3FE00FL000007222A9KS
         Disk Status: Online, 35842048 blocks, 730 resets
  Logical Drives:
    /dev/rd/c0d0: RAID-5, Online, 35844096 blocks, Write Thru
    /dev/rd/c0d1: RAID-5, Online, 35840000 blocks, Write Thru

 
 
 

SCSI bus resets w/Linux 2.4.x and Mylex DAC960 RAID

Post by kermi » Tue, 05 Jul 2005 05:48:55



> A client has a 3-disk SCSI RAID array attached to an old Mylex DAC960
> controller. About six months ago, they lost one of the disks, which was
> fine, except there was no alarm and they didn't notice... Last
> Wednesday at 9:03AM, they lost a second disk - and believe me, *that*
> got their attention. :}

> They replaced the two failed drives (older 10K IBM models) with newer
> Seagates in the same drive caddies, leaving the remaining working IBM
> drive in place (don't ask...). All drives are 18GB SCSI. After the
> array was rebuild and yours truly resurrected things from backup tapes,
> we noticed that the system was slow and there were huge "gaps" in
> activity where for 5-8 seconds seemingly nothing was getting processed
> and no disk activity was happening. Long story short we found that the
> SCSI bus was undergoing a full reset every minute or so (see dump
> below).

> So here's the question: assuming termination and cables are all fine
> (as the old drive caddies were reused in their same positions, and they
> look OK), could the fact that we've got two different types of drives
> in the same array be causing the resets?

It is not that much different types of drives; rather it is just different
drives. We have a problem when a customer could not (or did not want) to
wait for delivery of qualified drives and they bought the same drive type
from another supplier; the only difference was FW. The result was - every
time they tried to replace these drives online they got corrupted
database :) Another customer got the whole system down (all drives "defect"
and LUNs offline) attempting to add drives online in similar case.

As long as drives are qualified for a given controller it should be possible
to mix them; unfortunately you never (rare) get open information about
compatibility.

You may try to remove IBM and see if running with Seagate still has the same
issue. If yes - it is just that 960 has some problems with them.

Of course you cannot rule out bad drive - one of the new ones.

=arvi=

- Show quoted text -

Quote:> I've heard tale tell the
> DAC960 is pretty finicky  about the types of drives it likes to talk
> to. And I've never tried to do hardware RAID with dissimilar drives (of
> different generations, no less)  before.

> Thoughts/ideas welcome.

> -jr-

> $ more /proc/rd/c0/current_status
> ***** DAC960 RAID Driver Version 2.4.11 of 11 October 2001 *****
> Configuring Mylex DAC960PG PCI RAID Controller
>   Firmware Version: 4.06-0-08, Channels: 2, Memory Size: 8MB
>   PCI Bus: 0, Device: 11, Function: 1, I/O Address: Unassigned
>   PCI Address: 0xF4104000 mapped at 0xD0800000, IRQ Channel: 18
>   Controller Queue Depth: 64, Maximum Blocks per Command: 128
>   Driver Queue Depth: 63, Scatter/Gather Limit: 33 of 33 Segments
>   Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
>   Physical Devices:
>     0:0  Vendor: SEAGATE   Model: ST318406LW        Revision: 0108
>          Serial Number: 3FE0HNRH00007222F4Q5
>          Disk Status: Online, 35842048 blocks, 730 resets
>     0:1  Vendor: IBM       Model: DDYS-T18350N      Revision: S80D
>          Serial Number:         VEL7C714
>          Disk Status: Online, 35842048 blocks, 730 resets
>     0:2  Vendor: SEAGATE   Model: ST318406LW        Revision: 0108
>          Serial Number: 3FE00FL000007222A9KS
>          Disk Status: Online, 35842048 blocks, 730 resets
>   Logical Drives:
>     /dev/rd/c0d0: RAID-5, Online, 35844096 blocks, Write Thru
>     /dev/rd/c0d1: RAID-5, Online, 35840000 blocks, Write Thru


 
 
 

SCSI bus resets w/Linux 2.4.x and Mylex DAC960 RAID

Post by Ulrich.Teich.. » Wed, 06 Jul 2005 03:59:47



>A client has a 3-disk SCSI RAID array attached to an old Mylex DAC960
>controller. About six months ago, they lost one of the disks, which was
>fine, except there was no alarm and they didn't notice... Last
>Wednesday at 9:03AM, they lost a second disk - and believe me, *that*
>got their attention. :}
>They replaced the two failed drives (older 10K IBM models) with newer
>Seagates in the same drive caddies, leaving the remaining working IBM
>drive in place (don't ask...). All drives are 18GB SCSI. After the
>array was rebuild and yours truly resurrected things from backup tapes,
>we noticed that the system was slow and there were huge "gaps" in
>activity where for 5-8 seconds seemingly nothing was getting processed
>and no disk activity was happening. Long story short we found that the
>SCSI bus was undergoing a full reset every minute or so (see dump
>below).

That's not a dump of a SCSI bus reset. This would reveal what drive caused
the reset (reset on behalf of....).

Quote:>So here's the question: assuming termination and cables are all fine
>(as the old drive caddies were reused in their same positions, and they
>look OK), could the fact that we've got two different types of drives
>in the same array be causing the resets? I've heard tale tell the
>DAC960 is pretty finicky  about the types of drives it likes to talk
>to. And I've never tried to do hardware RAID with dissimilar drives (of
>different generations, no less)  before.

Well, all hardware RAID controlers are picky about drive FW and drives
of different types. It would be interesting to see what really collided
in the log.

[del]

Quote:>  Logical Drives:
>    /dev/rd/c0d0: RAID-5, Online, 35844096 blocks, Write Thru
>    /dev/rd/c0d1: RAID-5, Online, 35840000 blocks, Write Thru

2 RAID 5's? On 3 drives? Are you sure that's right?

HTH,
Uli
--

Stormweg 24               |listening to: Suicide Drive (The Deep Eynde)
24539 Neumuenster, Germany|Public * (Interpol) Cauchemar (Opration S)

 
 
 

1. Suse Linux 6.4 and Mylex DAC960 PDU Raid Controller Problems

Hi,

Has anyone successfully installed Suse Linux 6.4 onto a Mylex DAC960
PDU? I beleive that this should be possible but when I try installing
under YAST2 I get the following errors during the package installation
phase.

Could't setup swap partition /dev/rd/c0d01
RPM returned an error
The root password could not be set
Couldn't write values to /etc/rc.config

I have also tried using the text based YAST installation but that
doesn't even detect the array.

I know that the hardware is set up correctly as I can install and run
Redhat 6.2 without any problems.

Any help would be appreciated.

Ash

2. Slackware install problem

3. Linux SUSE 6.3 on Digital AlphaServer 1000A 5/466 with Mylex DAC960 RAID controller

4. Maxtor 40G and Mandrake 7.0

5. eisa mylex dac960 raid driver ?

6. aha152x: P_CMD: 16(0) bytes left in FIFO, resetting

7. RH 7.3: Mylex RAID controller (DAC960) performance problems

8. can't resolve symbol 'XpmFree'

9. Mylex DAC960 RAID Controller

10. Mylex DAC960 Raid controller

11. RedHat 6.0 and Mylex DAC960 Raid Array

12. how do i get my eisa mylex dac960 raid going ?

13. Mylex DAC960 raid controller insmod install problem