Disk Errors (hard ecc error)

Disk Errors (hard ecc error)

Post by Karra V. Red » Thu, 25 Nov 1993 02:51:34



I am getting these errors from the system accounting. We can't seem to figure out what these errors correspond to.

These are the error messages we recieve:

Nov 22 01:00:21 h vmunix: xy1g: read retry (hard ecc error) -- blk #898, abs blk
 #648118
Nov 22 01:00:22 h vmunix: xy1g: read retry (hard ecc error) -- blk #898, abs blk
 #648118
Nov 22 01:00:22 h vmunix: xy1g: read failed (hard ecc error) -- blk #898, abs bl
k #648118
Nov 22 01:00:22 h vmunix: xy1g: read retry (hard ecc error) -- blk #898, abs blk
 #648118
Nov 22 01:00:22 h vmunix: xy1g: read retry (hard ecc error) -- blk #898, abs blk
 #648118
Nov 22 01:00:22 h vmunix: xy1g: read failed (hard ecc error) -- blk #898, abs bl

I would appreciate it if someone can explain cause of the errors and
suitable remedy.

Thanks,
Karra V. Reddy
Systems Assistant
West *ia Univ.


--
Karra V. Reddy
Graduate Systems Assistant

B2, Knapp Hall

 
 
 

Disk Errors (hard ecc error)

Post by Mickey Bo » Wed, 01 Dec 1993 12:49:06



Quote:> I am getting these errors from the system accounting. We can't seem to figure out what these errors correspond to.
> These are the error messages we recieve:
> Nov 22 01:00:21 h vmunix: xy1g: read retry (hard ecc error) -- blk #898, abs blk
>  #648118
> [...]
> I would appreciate it if someone can explain cause of the errors and
> suitable remedy.

These are probably bad sectors, and probably just normal wear and tear (note
excessive "probablys", it might be the controller, or weird interference on
the cables, or inadequate grounding, but probably not <grin>).  If you have
some that are being found, you might have others that are either not
frequently accessed or marginal.  You, er, probably want to reformat the
disk, and perform surface analysis for several passes.  Then, restore the
filesystems from backup tapes.

One suggestion:  after reformatting, restore what backups you can from tar
tapes, not dumps (you should make a set of each, just to be safe).  The
reason for this is that with tar, you will end up with a fresh new filesystem
(that is, you will create new filesystems with newfs and write your files to
them).  With dump, you get some (or all) of the old lower level filesystems
back.  Tar works completely "above" the filesystem.  Filesystems can also get
a bit long in the tooth with age, and fsck cannot always find all the
problems.  Thus, since you are going to freshen up the lower level structure
of your disk, you might as well do the same to the higher level filesystem
(ah Xmas, a time for major system surgery).  This does not apply to system
partitions (/, /usr, /var, /export), as there are files within that cannot be
tarred (devices and FIFOs).  This would also be a good time to tweak partition
sizes, if needed.

I once had a bizzare filesystem problem on a large user partition that would
not go away, because I was restoring dumps to the newly created filesystem.
Using tar fixed it.  Fsck reported no errors either before or after I fixed
it (thus eroding my total confidence in fsck, which I miss. :-).

--
******************************************************************************
*                                Mickey Boyd                                 *
*                           Systems Administrator                            *
*              Florida State University Mathematics Department               *

******************************************************************************

 
 
 

Disk Errors (hard ecc error)

Post by Per Hedela » Mon, 06 Dec 1993 04:12:08



>One suggestion:  after reformatting, restore what backups you can from tar
>tapes, not dumps (you should make a set of each, just to be safe).  The
>reason for this is that with tar, you will end up with a fresh new filesystem
>(that is, you will create new filesystems with newfs and write your files to
>them).  With dump, you get some (or all) of the old lower level filesystems
>back.

This is not correct - while dump "bypasses" the file system (though I
don't believe there is anything in the created dump file that isn't
available through the file system - the reason for bypassing it is
speed), restore works entirely within the file system, so your file
system will be just as fresh with dump/restore (assuming you newfs
before the restore, of course, but the same is true for the tar method).
Don't forget to remove the restoresymtable file, though...

--Per Hedeland


...uunet!erix.ericsson.se!per

 
 
 

1. iostat -E reporting hard errors and transport errors on sd1

Hello,

I just saw that since a month or so I have a lot of the following
messages in /var/adm/messages:

...
May 13 09:48:33 host scsi: [ID 107833 kern.warning] WARNING:

May 13 09:48:33 host         Error for Command: read(10)
Error Level: Retryable
May 13 09:48:33 host scsi: [ID 107833 kern.notice]   Requested Block:
14313992                  Error Block: 14313998
May 13 09:48:33 host scsi: [ID 107833 kern.notice]   Vendor: IBM
                         Serial Number: 4FY39463
May 13 09:48:33 host scsi: [ID 107833 kern.notice]   Sense Key: Media Error
May 13 09:48:33 host scsi: [ID 107833 kern.notice]   ASC: 0x11
(unrecovered read error), ASCQ: 0x0, FRU: 0x0
May 13 09:48:33 host scsi: [ID 107833 kern.warning] WARNING:

May 13 09:48:33 host         Error for Command: read(10)
Error Level: Fatal
May 13 09:48:33 host scsi: [ID 107833 kern.notice]   Requested Block:
14313992                  Error Block: 14313998
May 13 09:48:33 host scsi: [ID 107833 kern.notice]   Vendor: IBM
                         Serial Number: 4FY39463
May 13 09:48:33 host scsi: [ID 107833 kern.notice]   Sense Key: Media Error
May 13 09:48:33 host scsi: [ID 107833 kern.notice]   ASC: 0x11
(unrecovered read error), ASCQ: 0x0, FRU: 0x0
May 13 09:48:33 host md_stripe: [ID 641072 kern.warning] WARNING: md:
d0: read error on /dev/dsk/c0t1d0s6
...

Also an iostat -E would report the following:

sd1      Soft Errors: 0 Hard Errors: 70586 Transport Errors: 85265
Vendor: IBM      Product: DDYS-T36950M     Revision: S96H Serial No:
      4FY39463
Size: 36.70GB <36700747776 bytes>
Media Error: 60501 Device Not Ready: 0 No Device: 10085 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

Btw: is there a documentation somewhere which explains that output, like
what is a soft error, hard error, transport error, and so on ?

So my question now is, is this somehow bad ? Does that mean the drive
will fail soon ? What you guys recommend ?

Regards

2. ACPI4LINUX

3. Large hard disk error

4. Cant telnet rh5.2 default install

5. Hard error writing to disk?

6. Debian and Linksys Blue Box

7. Logging soft errors / ECC L1/L2 cache statistics

8. passwords for linux and novell servers

9. Hard disk errors

10. write errors with seagate hard disk

11. Hard disk I/O error during booting

12. Need Help With Hard Drive Disk Errors

13. Repost: Help with "GRUB hard disk error"