Kernel 2.4.18 + ext3 = filesystem corruption

Post by Silva » Sun, 28 Apr 2002 00:08:10

I had a filesystem explosion (across the board corruption on all ext3
partitions, brought to my attention by a rather * series of EXT3_fs
errors and an immediate crash) about a month back.  I lost some data, and
there were numerous errors that had to be repaired.  I realized that I
couldn't ever remember seeing a fsck since moving to ext3, so I
experimented with tune2fs and dumpe2fs.  All of the partitions were set
with a random, large negative number for the maximum mount count.  I have
no idea whether that was true before the crash or not.

I began to eliminate suspects.  I checked RAM, running memtest 86 for 24
hours with no errors.  I reverted to kernel 2.4.16, removed my journals and
mounted ext2, switched to conservative settings all around (lower PCI bus
speed, conservative hdparm options, etc.)

I've been gradually bringing things back to their previous state, fscking my
partitions regularly (daily at first, then gradually moving to a 5 mounts or
7 days interval) as I've gone.  I had no filesystem errors of any kind
since the last disaster, and then I decided to move back to 2.4.18.  No
problems for a few days, so I installed new journals and moved back to

Within four days, I had another filesystem explosion, though it was less
severe than the last, affecting only hde6 and hde9.  Files weren't there,
entire chunks of directories were missing, and so on, and everything was
pretty well hosed in that state.  Upon recovery, while fscking during the
subsequent boot, there were numerous, numerous errors in the filesystem,
and I've lost more data.

Both crashes were seemingly spontaneous, triggered by everyday activities,
probably brought on when I finally accessed a file that had a hole in it
and brought the whole house of cards tumbling down.  I have no record of
anything that happened, as nothing useful made its way into any of my logs.

I'm running Mandrake 8.1 with a generic kernel 2.4.18.  No patches.  2.4.16
and 2.4.18 were compiled with the same config, and I don't think there is
anything noteworthy there.  Everything works perfectly, except that I
sometimes experience video display corruption when switching virtual
terminals.  More a problem with 2.4.16, but it happens with both kernels.  
(A separate issue, and not the source of my concern.  I'll take video
corruption over data loss any day.)

My hardware:

AMD K7-1000 on ASUS A7V (VIA Apollo KT133a chipset, integrated Promise
ATA-100 controller), 256 MB RAM, Linksys 10/100E NIC, USR PCI Performance
Pro modem, SB PCI 128, Riva TNT2 AGP video (running at 4X in BIOS and in X),
CREATIVE CD-RW RW8439E, CD-950E/TKU, Maxtor 94610H6, generic PS/2 mouse,
generic FD, and a 104-key keyboard.

Kernel boots (LILO) with:  append=" devfs=nomount hda=ide-scsi

Partitions are all ext3.

I can think of nothing further to add.  I've done my best to isolate this,
but I haven't been able to repeat this in a controlled way so that I can
attribute a precise event as triggering the corruption.  It seems that the
corruption is on-going over time while 2.4.18 is running, and eventually I
stumble into a hole.  I personally suspect the journaling code, but that's
speculation borne of almost pure ignorance.

Michael McIntyre  zone 6b in SW VA
Silvan Pagan
umount /mnt/windows;mke2fs /dev/hde1;tune2fs -j /dev/hde1