Please help: aacraid panic "sg list too long"

Please help: aacraid panic "sg list too long"

Post by David Trus » Thu, 20 Feb 2003 12:36:37



Hi,

We have a RH7.3 system running kernel 2.4.18-19.7.xsmp.

We are getting panics with messages like these:

aacraid: panic: length of sg list is too long

It seems to happen after we read a lot of data in from a raid
device.

The disks (an entire RAID container) were moved from a system
running 2.4.18-10smp.

Here are the contents of /proc/scsi/sg/debug

dev_max(currently)=11 max_active_device=5 (origin 1)
 scsi_dma_free_sectors=3872 sg_pool_secs_aval=320 def_reserved_size=32768
 >>> device=sg0 scsi0 chan=0 id=0 lun=0   em=0 sg_tablesize=16 excl=0
   FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(2): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
 >>> device=sg1 scsi0 chan=0 id=2 lun=0   em=0 sg_tablesize=16 excl=0
   FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(2): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
 >>> device=sg2 scsi2 chan=0 id=2 lun=0   em=0 sg_tablesize=128 excl=0
   FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(2): timeout=300000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(3): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(4): timeout=300000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(5): timeout=300000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
 >>> device=sg3 scsi2 chan=0 id=3 lun=0   em=0 sg_tablesize=128 excl=0
   FD(1): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active
   FD(2): timeout=60000ms bufflen=32768 (res)sgat=0 low_dma=0
   cmd_q=0 f_packid=0 k_orphan=0 closed=0
     No requests active

Here are the rest of the contents of /proc/scsi/sg/

# cat def_reserved_size
32768

# cat device_hdr
host    chan    id      lun     type    opens   qdepth  busy    online

# cat devices
0       0       0       0       0       8       10      0       1
0       0       2       0       0       3       10      0       1
2       0       2       0       0       6       253     0       1
2       0       3       0       0       3       253     0       1
2       0       4       0       0       1       253     0       1

# cat device_strs
DELL            PERCRAID Mirror         V1.0
DELL            PERCRAID RAID5          V1.0
IFT             SR2000                  0312
IFT             SR2000                  0312
IFT             SR2000                  0312

# cat host_hdr
uid     busy    cpl     scatg   isa     emul

# cat hosts
0       0       512     16      0       0
0       0       2       128     0       0
1       0       2       128     0       0
2       0       2       128     0       0
1280    0       63      26      0       0

# cat host_strs
percraid
Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8         <Adaptec aic7899
Ultra160 SCSI adapter>         aic7899: Ultra160 Wide Channel B, SCSI Id=7,
32/253 SCBs
Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8         <Adaptec 3960D
Ultra160 SCSI adapter>         aic7899: Ultra160 Wide Channel A, SCSI Id=7,
32/253 SCBs
Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8         <Adaptec 3960D
Ultra160 SCSI adapter>         aic7899: Ultra160 Wide Channel B, SCSI Id=7,
32/253 SCBs
LSI Logic MegaRAID 161N 254 commands 15 targs 7 chans 7 luns

# cat version
30124   Version: 3.1.24 (20020505)

# cat allow_dio
0

This is a real show-stopper.

Any ideas?

David