Kernel 2.0.30 crashes regularly and destroys file systems

Kernel 2.0.30 crashes regularly and destroys file systems

Post by Aurelio Hinarejo » Tue, 03 Feb 1998 04:00:00



Hello,

I have a problem with my Linux system. It is based on a Debian 1.3.1
distribution, although I built a 2.0.30 kernel specifically tailored for

my machine. However, I've experimented four system crashes that wreak
havoc
with the file systems.

The disk and IDE chipset characteristics are (according to Linux):

ide: i82371 PIIX (Triton) on PCI bus 0 function 57.
        ide0: BM-DMA at 0xf000-0xf007
hda: Conner Peripherals 1080 MB - CFS1081A, 1032 MB w/0KB Cache, LBA,
CHS=524/64/63
ide0 at 0x1f0-0x1f7, 0x3f6 on irq 14.

First I receive several console messages, from the
hard disk driver, warning that something is wrong with the disk:

=====================================================================
hda: read_intr: status=0x59 {DriveReady SeekComplete DataRequest Error}
hda: read_intr: error=0x40 {Uncorrectable Error}, LBAsect=2075232,
sector = 30945
end_request: I/O Error, dev 03:08, sector 30945

hda: irq timeout: status=0xd0 {Busy}
ide0: reset: success

hda: write_intr: status=0x51 {DriveReady, Seek Complete Error }
hda: write_intr: error=0x10 {SectorIdNotFound}, LBA sect=2067428,
sector=23140
hda: status error: status=0x59 {DriveReady SeekComplete DataRequest
Error}

hda: no DRQ after issuing WRITE
ide0: reset: success
hda: irq timeout: status=0xd0 {Busy}
====================================================================

Some days later, the showdown:

EXT2-fs warning (device 03:05): ext2.free.blocks: bit already cleared
for block 17
EXT2-fs warning (device 03:05): ext2.free.blocks: bit already cleared
for block 2046
EXT2-fs warning (device 03:05): ext2.free.blocks: bit already cleared
for block 2046
EXT2-fs warning (device 03:05): ext2.free.blocks: bit already cleared
for block 2038

EXT2-fs error (device 03:05): ext2_free_blocks: freeing blocks not in
datazone - block=12304384,count=1

kernel panic: EXT2-fs panic (device 03:05: load_block_bitmap:
block_group >= groups_count - block_group=524287, group_counts=13
======================================================================================

The kernel crashed leaving all the linux partitions unusable. I tried to
fix them using fsck but it wasn't possible, at all.

I've tried to find an explanation but my knowledge of the kernel is
limited so I've not been able to get to a conclusion. I suspect that
maybe the problem lies on the hardware but I'm not sure.

I'm very active, in my work environment, promoting Linux as a very
serious
alternative to other operating systems so I need to prove that the
crashes
are not Linux's fault. If someone has any idea I'd be very grateful for
his/her help.

Thanks in advance.

--
Aurelio Hinarejos

 
 
 

Kernel 2.0.30 crashes regularly and destroys file systems

Post by Bill Anderso » Tue, 03 Feb 1998 04:00:00


:Hello,
:
:I have a problem with my Linux system. It is based on a Debian 1.3.1
:distribution, although I built a 2.0.30 kernel specifically tailored for
:
:my machine. However, I've experimented four system crashes that wreak
:havoc
:with the file systems.
:
:The disk and IDE chipset characteristics are (according to Linux):
:
:ide: i82371 PIIX (Triton) on PCI bus 0 function 57.
:        ide0: BM-DMA at 0xf000-0xf007
:hda: Conner Peripherals 1080 MB - CFS1081A, 1032 MB w/0KB Cache, LBA,
:CHS=524/64/63
:ide0 at 0x1f0-0x1f7, 0x3f6 on irq 14.
:
:First I receive several console messages, from the
:hard disk driver, warning that something is wrong with the disk:
:
:=====================================================================
:hda: read_intr: status=0x59 {DriveReady SeekComplete DataRequest Error}
:hda: read_intr: error=0x40 {Uncorrectable Error}, LBAsect=2075232,
:sector = 30945
:end_request: I/O Error, dev 03:08, sector 30945
:
:hda: irq timeout: status=0xd0 {Busy}
:ide0: reset: success
:
:hda: write_intr: status=0x51 {DriveReady, Seek Complete Error }
:hda: write_intr: error=0x10 {SectorIdNotFound}, LBA sect=2067428,
:sector=23140
:hda: status error: status=0x59 {DriveReady SeekComplete DataRequest
:Error}
:
:hda: no DRQ after issuing WRITE
:ide0: reset: success
:hda: irq timeout: status=0xd0 {Busy}
:====================================================================
:
:Some days later, the showdown:
:
:EXT2-fs warning (device 03:05): ext2.free.blocks: bit already cleared
:for block 17
:EXT2-fs warning (device 03:05): ext2.free.blocks: bit already cleared
:for block 2046
:EXT2-fs warning (device 03:05): ext2.free.blocks: bit already cleared
:for block 2046
:EXT2-fs warning (device 03:05): ext2.free.blocks: bit already cleared
:for block 2038
:
:EXT2-fs error (device 03:05): ext2_free_blocks: freeing blocks not in
:datazone - block=12304384,count=1
:
:kernel panic: EXT2-fs panic (device 03:05: load_block_bitmap:
:block_group >= groups_count - block_group=524287, group_counts=13
:===========================================================================
===========
:
:The kernel crashed leaving all the linux partitions unusable. I tried to
:fix them using fsck but it wasn't possible, at all.
:
:I've tried to find an explanation but my knowledge of the kernel is
:limited so I've not been able to get to a conclusion. I suspect that
:maybe the problem lies on the hardware but I'm not sure.
:
:I'm very active, in my work environment, promoting Linux as a very
:serious
:alternative to other operating systems so I need to prove that the
:crashes
:are not Linux's fault. If someone has any idea I'd be very grateful for
:his/her help.
:
:
:Thanks in advance.
:
:--
:Aurelio Hinarejos

A friend of mine recently had undergone a similiar situation with his IDE
setup. We have been running 2.0.30 for a long time w/no problems like the
above. It eventually turned out (after going through, IIRC, 3 HDs, the IDE
controller going belly-up.  You may want to chenck that out. I don't know
what you did to your kernel, so I can't comment on that.

bill Anderson
*highway:

 
 
 

Kernel 2.0.30 crashes regularly and destroys file systems

Post by Eric J. Fenderso » Wed, 04 Feb 1998 04:00:00


<snip>

Quote:> I'm very active, in my work environment, promoting Linux as a very
> serious
> alternative to other operating systems so I need to prove that the
> crashes
> are not Linux's fault. If someone has any idea I'd be very grateful for
> his/her help.

By no one's means am I an expert, but this is what I'd try:  swap out the
harware that you expect has a problem with it and replace it with a good
version.  If it wors, then it shouldn't be Linux's fault
--
** Eric J. Fenderson ***
 
 
 

Kernel 2.0.30 crashes regularly and destroys file systems

Post by James Woodwar » Fri, 06 Feb 1998 04:00:00




> :Hello,
> :
> :hda: Conner Peripherals 1080 MB - CFS1081A, 1032 MB w/0KB Cache, LBA,
> :CHS=524/64/63
> :ide0 at 0x1f0-0x1f7, 0x3f6 on irq 14.
> :
> :hda: read_intr: status=0x59 {DriveReady SeekComplete DataRequest Error}
> :hda: read_intr: error=0x40 {Uncorrectable Error}, LBAsect=2075232,
> :sector = 30945
> :end_request: I/O Error, dev 03:08, sector 30945

I have had this problem on my machine, and both with Connor drives...
dunno what it is with them, but they either a) dont like some controlers
and some Connor drives will not operate relibly as either mater or slave to a Quantum.. (well this is my experiance)

Seagate's and Quantums no problem, ive had a number of each under Linux
but the Connors ive tried ive had major problems with..

I presently running my Linux setup on a Quantum Bigfoot CT 4.3gig
no problems with it so far...

as to fixng your problem, im not sure how, i managed to go sideways around
the problem using a ISA I/O board, but speed and performance dropped way out :)

    ______ ____ ____  __


/\___/ /  / /  / /  \/ / /            
\_____/   \/   \/      \/       www: http://jim.southcom.com.au
__

Laugh and the world laughs with you.  Cry, and someone yells, "Shut up!"

 
 
 

Kernel 2.0.30 crashes regularly and destroys file systems

Post by Friedhelm Mehner » Sat, 07 Feb 1998 04:00:00



: <snip>
:> I'm very active, in my work environment, promoting Linux as a very
:> serious
:> alternative to other operating systems so I need to prove that the
:> crashes
:> are not Linux's fault. If someone has any idea I'd be very grateful for
:> his/her help.

: By no one's means am I an expert, but this is what I'd try:  swap out the
: harware that you expect has a problem with it and replace it with a good
: version.  If it wors, then it shouldn't be Linux's fault
: --
: ** Eric J. Fenderson ***

It is either the drive or the controller starting to die.

In more than 90% the problem is heat related!!!

Simply the drive-electronics ore/and the controler chip is getting too
hot. :-(

This *may* (just) work under windoze and dos, since with these
operating systems, the drives are just sitting there, not beeing accessed
most of the time.

This is different with operating systems like Linux, Unix, NT. Those *do*
put a lot more stress onto your drives.

Try to improve the cooling!

You don't overclock anything, do you?

Does a kernel compile give you SIG 11s as well?
If so, it may also be bad cache, bad memory, too agressive BIOS settings.

One last thing:
Have a look at hdparm and play with the settings a little bit.
Some EIDE features give problems with certain drives and/or controler-
(combinations). See the docs that come with it, as well as the source
of ide.c within the kernel sources for more information on this.

Good Luck and Regards,
Friedhelm

--
Microsoft is NOT the answer. Microsoft is the Question.
The answer is: "NO!"
-------------------------------------------------------------------
Friedhelm Mehnert,  Berliner Allee 42,  22850 Norderstedt,  Germany

-------------------------------------------------------------------

 
 
 

1. 1.0.9 kernel crashes regularly 30 minutes after any floppy access

As the subject says:

When using kernel 1.0.9, about 30 Minutes after accessing some floppies
(on absolutely standard TEAC 1.44 MB floppy driven by absolutely
standard WD 1003 HDD/FDD-Controller), not regarding if using tar,
mount, or mcopy for the floppy access, the kernel will not longer
allow any new process being forked, but reply on any fork with
"segmentation violation" and dump the registers. (Maybe there is
need for nfs-activity (or something else) for this to happen; it
didn't happen in maintainance mode; but no unusual HW besides scsi
card; no CD, no sound)

Booting the same system with an 0.99.15f kernel does not give this
result, besides giving someway more network throughput and none of the
many network errors 1.0.9 does show.

Just for the records; i'm using 0.99.15f again.

Peter

P.S.:
 - Anybody intending to repair yacc someday, so that

  $  < /dev/null ( cat )
  bash: syntax error near unexpected token `('

   will work again in bash? C News needs that sometimes.
 - Anybody knowing why lpd nowadays *crashes* the net instead of
   *printing* over the net? (i.e. lpd does print, but nothing else
   will be transfered afterwards until an ifconfig eth0 down/up.)
--
                * * * * *  Alle Netze sind ein Netz  * * * * *
 Write to:  Peter Much * Koelnische Str. 22 * D-34117 Kassel * +49-561-774961

2. would like some basic firewall

3. new Kernel 2.0.30 crashes...SCSI problem.

4. Samba problem - can't see LINUX from Windows

5. diald crashing with 2.0.30 kernel

6. What's a segmentation fault ?

7. File system destroyed on kernel rebuild

8. remap_page_range hangs my machines

9. Reading UFS File System on Linux 2.0.30

10. What kind of HW needed for a 30 users systems + a system monitor tool ?

11. HELP: Compiling kernel problem. Kernel 2.0.30

12. Slakware /W kernel 2.0.30 to RH5 W Kernel 2.0.32

13. Slack 1.2.0 crashes regularly