BIG BUG: Solaris2.1x86 SCSI-driver bombs (Adaptec 1742)

BIG BUG: Solaris2.1x86 SCSI-driver bombs (Adaptec 1742)

Post by Arno Hah » Mon, 04 Jul 1994 10:00:53



There appears to be a severe bug or bugs in the Adaptec AHA-1742A EISA
driver for Solaris 2.1x86. In the following I shall describe problems,
that has been encountered so far.

1. There is a magneto-optical SCSI drive connected to
SCSI-target-1. Whenever this particular device is
referenced, the following error messages will appear on the
console four times in a rapid sequence:


        Error for command 'mode sense', Error Level: Fatal
        Requested Block: -1, Error Block: -1
        Sense Key: Illegal Request
        Vendor 'RICOH  ': ASC = 0x24 (invalid cdb), ASCQ =
        0x0, FRU = 0x0

After this, the drive appears to work normally, e.g.
mounting the drive will produce four of the above error
messages and then the mounted drive works.... almost.

The message itself makes no sense - what block -1? Drives
start their block numbering from 0, how come does the driver
request a block -1???

2. The SCSI-driver causes a kernel panic, if the disk is not
sliced with 4096 sectors reserved for alternate sectors.
First I tried to slice the MO-disk as follows (contents of
the device stanza listed):

* /dev/rdsk/c0t1d0p0 default partition map
*
* Dimensions:
*     512 bytes/sector
*      32 sectors/track
*       64 tracks/cylinder
*     278 cylinders
*     278 accessible cylinders
*                               Note: 1024 kbytes/cylinder
* Flags:
*   1:  unmountable
*  10:  read-only
*

* Partition    Tag     Flag         First Sector    Sector Count
    0           2       00              2048            0
    2           0       00              0               569344
    6           8       00              2048            565248
    9           9       01              567296          2048

Note especially the last line. That means, I reserved only
one cylinder for alternate sector mapping - one megabyte is
more than enough for such a purpose. However, Solaris wants
to allocate two cylinders for that purpose by default, i.e.
4096 sectors. Problem:

Whenever the disk with the above slicing is written onto or
read from, a kernel panic will result. The panic messages
vary, once it was segmentation fault, once illegal trap some
other time something else, but still memory oriented. Once
tar informed a segmentation fault and then sched told the
same and then there was a kernel panic.

3. I resliced the drive with the 4096 sectors for bad sector
mapping, and the problem went away - almost. Such a disk is
usable, if you write only small amounts of data onto it.
Whenever more than 5..10 megabytes is dumped onto the disk,
the kernel panics with some kind of a memory oriented
message (trap, segmentation fault, etc.). This problem can
be brought about with 100 % certainty by detarring a tape or
tarfile onto the MO-disk.

4. Whenever one attempts to read large amounts of data from any hard
disk, the kernel panics with a memory-oriented message. For instance, I
tried to read the table of contents from a 240 megabyte tarfile using
"tar tfv tarfile.tar".  After reading some tens of MB the system panics
and reboots.

I would appreciate, if someone could tell how to work around
these problems. A patch would be nice...


ArNO
    2

 
 
 

BIG BUG: Solaris2.1x86 SCSI-driver bombs (Adaptec 1742)

Post by Richard Mathe » Thu, 07 Jul 1994 08:45:11




>    Error for command 'mode sense', Error Level: Fatal
>    Requested Block: -1, Error Block: -1
>    Sense Key: Illegal Request
>    Vendor 'RICOH  ': ASC = 0x24 (invalid cdb), ASCQ =
>    0x0, FRU = 0x0

This warning just means that the device does not support the mode-sense
command (which is an optional command).  This was fixed in 2.4 in time
for EA2 (but not EA1).  The fix was just to turn off these warnings when
the driver issues mode-sense commands.  The workaround is to ignore the
warnings.

As for your panics, I can't do much without more information (such as
a kernel stack trace from the panic).

     Richard M. Mathews                 F oster
                                         E stonian-Latvian-Lithuanian

                                           F reedom!

 
 
 

BIG BUG: Solaris2.1x86 SCSI-driver bombs (Adaptec 1742)

Post by Arno Hah » Fri, 08 Jul 1994 19:39:04





>the driver issues mode-sense commands.  The workaround is to ignore the
>warnings.

That is how I did, the drive in question appears to work despite of the
warnings. I only have to remember to delete /var/adm/messages.0
occasionally, as that file grows very fast with the messages appended
to it.

Quote:>As for your panics, I can't do much without more information (such as
>a kernel stack trace from the panic).

Could you tell how to save the information? The kernel dumps
a core somewhere (swap device?), but I can't restore it
after the machine has rebooted. How can I access the kernel
coredump?

Anyway, I wrote down today's panic messages. I have had at least 10 of
them so far today, a couple hung the machine so badly the machine
wouldn't reboot by itself. So I had time to write down the
screen...

First bad panic was with ftp, I was uploading a large
tarfile:

BAD TRAP
panic: segmentation fault

pid=267, addr=0xf2037b40, pte=0x0, pc=0xf2037b40,
sp=0xfc1d25a1, eflags=0x10206

eip(f2037b40),ebp(f8223b58),uesp(fc1d25a1),esp(f8223ad4),
eax(fc25ca68),ebx(fc25ca00),eex(fc3be800),edx(e1df4001),
esi(200),edi(0),cr0(8005003b),cr2(f2037b40),cr3(245000),
cs(158),ds(160),ss(ca68),fs(1a8),gs(1b0)

01366 static sysmap pages
00037 dynamic kernel data pages
00263 kernel-pageable pages
00000 segkmap kernel pages
00108 current user process pages
01774 total pages (1774 chunks)

dumping to vp fc26b904, offset 53368

Then the other panic, it came while doing "cp -r" from a
filesystem to another:

BAD TRAP
cp: page fault

pid=334, pc=0xf804255, sp=0x48000, eflags=0x10206

eip(f804255f), ebp(f8250794), uesp(48000),
esp(f8250768), eax(6002), ebx(48000),
eex(8006e000), edx(6), esi(fc48be70),
edi(8), cr0(80050033), cr2(48000),
cr3(245000), cs(158), ds(160),
ss(0), es(160), fs(1a8), gs(1b0)

1418 static
37 dynamic
194 kernel-pageable
2 segkmap
0 segvn
131 current user
1782 total pages

dumping to vp fc26b904, offset 53304.

All the panics I have had this far (tens if not a hundred times) have
been related to disk i/o. It is very annoying, when the machine panics
whenever you start something heavier than an editor or telnet. Starting
OpenWindows has about 50 % chance of bringing the system down and
trying to untar and compile something is out of question.

Quote:>     Richard M. Mathews                     F oster


ArNO
    2

 
 
 

1. BUG: Solaris2.1x86 elx0 driver bombs (3Com Elink III)

I have run extensive diagnostics on 3Com Etherlink III EISA 3C579
ethernet card and the diagnostics has found the card fully operative
and ok. When the card is used with Solaris 2.1x86, the
following message will appear on the console:

WARNING: Adapter failed: fifo diag 2000<RXU>

After that, the only way to get the network to function again is to
reboot the machine. Normally, this situation appears pretty randomly,
but it can be always generated by ftp'ing to certain sites and trying
to get something from them.

Is this a known bug with a possible patch available?  How about the
error message - what does it indicate and where should I start looking,
if this is just a configuration error (I doubt, though)?

Is there any kind of a mail address to SunSoft, where to report bugs
like this? There are a couple of other, serious bugs that cause
kernel panics, later on them...


ArNO
    2

2. Printer woes..

3. problem: Adaptec 1742 /Maxtor SCSI, more info...

4. hp unix question

5. problem: Adaptec 1742 not recognizing Maxtor SCSI drive.

6. Fault tolerant ideas?

7. 9.1 GB SCSI drive & Linux & Adaptec 1742

8. Fix NFS dentry lookup behaviour

9. problem: Adaptec 1742 /Maxtor SCSI, EVEN more info...

10. Adaptec AHA-1742 with IBM DPES-31080 harddisk

11. Adaptec 1742 & Linux?

12. Problem with adaptec 1742 and Seagate Hawk XL 2

13. Adaptec 1742 EISA and LINUX