Real DEC people help! Mysterious intermittent crashes...

Real DEC people help! Mysterious intermittent crashes...

Post by Chester L » Thu, 10 Aug 1995 04:00:00



Hi,  sorry I have to post this here, but the DEC support people are
giving me the run-around.

Our Alphastation 400 4/233 crashes intermittently, every few hours or
days, even when no jobs are running and no one is logged on.  When it
crashes, it prints the following messages on the screen.

vmunix: Retrying I/O (err 5) on block device 8,0
vmunix: Retrying I/O (err 5) on block device 8,6

THis seems like a hardware problem to me, but the hardware support
guy told me over the phone he has no idea about UNIX systems and
is unwilling to come see it.  The software guy says we don't have
a software support contract (we have campus licensing) so they can't
help us.

Can anyone tell me what the error means, what I can do, and maybe walk
me through a diagnosis, at least so I know which piece of hardware,
if any, needs to be replaced?

Much appreciated,
Chester Liu


 
 
 

Real DEC people help! Mysterious intermittent crashes...

Post by sto » Thu, 10 Aug 1995 04:00:00




>Hi,  sorry I have to post this here, but the DEC support people are
>giving me the run-around.

>Our Alphastation 400 4/233 crashes intermittently, every few hours or
>days, even when no jobs are running and no one is logged on.  When it
>crashes, it prints the following messages on the screen.

>vmunix: Retrying I/O (err 5) on block device 8,0
>vmunix: Retrying I/O (err 5) on block device 8,6

>THis seems like a hardware problem to me, but the hardware support
>guy told me over the phone he has no idea about UNIX systems and
>is unwilling to come see it.  The software guy says we don't have
>a software support contract (we have campus licensing) so they can't
>help us.

>Can anyone tell me what the error means, what I can do, and maybe walk
>me through a diagnosis, at least so I know which piece of hardware,
>if any, needs to be replaced?

>Much appreciated,
>Chester Liu



You may have already figured this out but...

Block devices 8,0 (8 and 0 are the major and minor device numbers)
and 8,6 are /dev/rrz0a and /dev/rrz0g.

It sounds like one of two things(I am no hardware wiz so take this with
a grain of salt).

Either disk rz0 or the bus is fried.  If have more than one disk in
there then it is probably disk rz0 since there would probably be
complaints about both disks.  Make sure the connections are tight.

'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'
Aristos Koyanis                    LGRC(lowrise) A103
Assistant Unix Systems             Office of Information Technologies
Administrator & Programmer         University of Massachusetts, Amherst

_________________________________________________________________________

 
 
 

Real DEC people help! Mysterious intermittent crashes...

Post by Brian Saunde » Fri, 11 Aug 1995 04:00:00




>Either disk rz0 or the bus is fried.  If have more than one disk in
>there then it is probably disk rz0 since there would probably be
>complaints about both disks.  Make sure the connections are tight.

I had a machine which ran for a long while, and then crashed due to these
errors.  When it came down, it never got back up beyond the start-up
tests.  I simply pulled out and put back in the power and scuzzy cables,
and it worked again.

Incidentally, I also discovered we were using the disk slot not suggested
for use when you had only 1 hard disk.  It never seemed to have mattered,
but I ended up swapping it over to the other side anyway, just for fun.

--

 
 
 

Real DEC people help! Mysterious intermittent crashes...

Post by ful.. » Sat, 12 Aug 1995 04:00:00



>vmunix: Retrying I/O (err 5) on block device 8,0
>vmunix: Retrying I/O (err 5) on block device 8,6

Well, start with:

        grep 5 /usr/include/errno.h

to find that error 5 means "I/O error".  Block device 8,[06] is whatever device
is also known as "/dev/rz0?".  Reviewing the system startup record, either in
uerf or by watching the console at boot time will reveal what type of device is
at "rz0?".

To see the startup record in the error log:

        /usr/sbin/uerf -R -r 300 | more

To see disk errors in the error log:

        /usr/sbin/uerf -R -r 199 -o full | more
(assuming you're using CAM to access the disks)

+===================+========================+================================+



|      /  /   /   / +========================+================================+
| /___/  /_  /___/\ |     Opinions expressed here are mine, and mine alone    |
+===================+=========================================================+