>Date: Fri, 7 Jul 2000 18:04:05 -0400
>Newsgroups: comp.os.linux.networking, comp.os.linux.hardware
>Subject: Re: Weird corruption problems
>> I'd be looking at the disk subsystem. What controllers are you using? Sounds
>> to me like they maybe going bad. Anything in the log files? I've heard of
>> cases where extremely busy scsi buses can corrupt data, might that be the
>> case here? Give us some more details about the hardware involved.
>Hmm, that's something along the lines of what I was thinking too. We use
>all IDE controllers on the server - This one has four hard drives,
>hd[abcd]. They're all Western Digital 13G UDMA 33 drives. The
>motherboard is ASUS, IDE controllers are Intel PIIX4. hda4 is the root
>partition, hd[bc]1 are RAID1'ed together, hdd2 is a seperate disk for
>shell users' home directories. I can't imagine it has to do with the
>harddisks failing, since IDE drives do sector relocation automatically.
I tend to agree with you that the drives are not at fault, but WD does have a
tool, that can run in non destructive mode, that you can use to test each
drive. Depends how much time you have, and how desperate you are.
Quote:>Is there a chance it has to do with noise on the bus? Also, the location
>isn't known for having a particulary good electricity supply (it was as
>low as 90VAC once). I've seen this corruption happen on both /dev/hda4
>and /dev/md0, so I doubt it's a bug in the RAID code. (which, btw is
>kernel 2.2.11 with raid0145 patch)
I'm shooting at the power or lack thereof. Get a UPS NOW!! That will at
least prevent future power problems from affecting you. Since you have a T1,
I'm assuming you can afford to purchase a UPS, I think you can get them for
less than $100 now.
As to what part(s) the lack of power may have damaged, it will be a process of
elimination to figure it out. I had a power supply go whacy because of power
problems, it would cause my workstation to reboot at intermittent
intervals. That was extremely annoying. ;-) Coincidently a Cyrix system.
You've seen data corruption on both / and the raid. What level of raid? I'm
assuming one. You'll have to weigh the time vs. money factors but I think I'd
start with a new motherboard and perform the non destructive disk tests. If
you still see problems then examine disk cables, power supply. You might want
to replace the power supply along with the motherboard. If the power supply
is in fact bad, it might damage the new motherboard.
You could also examine the bug reports (if they exist) for the motherboard and
bios, you might find something, probably not though.
Summary
o - UPS
o - disk diag tests
o - new power supply & new mother board
Quote:>The server is a Cyrix 6x86MX 150, 64M of RAM, 128M of swap (swap is on
>hdd1). Network card is a 3Com 3C509B, video is a Bob's Generic Brand ISA
>VGA card... It's probably a trident 1 megger or something old like that.
It must be Bob's Generic ISA VGA. ;)
If I can help further let me know, I'd like to know the root problem, if you
determine it.
Regards,
Chad
>Thanks,
>Ross
>> >Date: Fri, 7 Jul 2000 16:49:38 -0400
>> >Newsgroups: comp.os.linux.networking, comp.os.linux.hardware
>> >Subject: Weird corruption problems
>> >Hi all,
>> > I've had a server running on a dedicated T1 for a long time and
>> >have had relatively few problems with it. It cooks along providing me
>> >nice bandwidth and responsive service. However, it has recently become
>> >plagued with a bizarre corruption problem.
>> > I first noticed it while uploading some scripts from other Debian
>> >boxen. The script would run fine on the local machines, and when FTPed to
>> >any other Debian box on the network. However, when FTPed to the remote
>> >one, the resulting file would be chock full o' errors. Weird stuff -
>> >sometimes a whole line would be left out of the middle, sometimes a few
>> >characters, and sometimes control characters would show up. I just redid
>> >the upload and it was fine again.
>> > I noticed it a second time when I recieved a "CRC error" report
>> >about my release 3 SlackReiser boot disk from an extremely helpful
>> >gentleman trying my software. He determined that the image on the remote
>> >machine was corrupt, and I have verified this fact. Hmm, now I realize
>> >something is fishy.
>> > Today I got a call from a business associate who uses this remote
>> >server for email, saying he couldn't relay mail from his new domain, could
>> >I add his new domain to our realy-domains file. I said sure, and opened
>> >it up. Much to my surprise, one of the characters in the file had been
>> >replaced with a control character. I fixed the error, but now I'm really
>> >scared about the rest of the data. What can I do to guarentee that it's
>> >all there and correct? What on earth could be causing such a bizarre
>> >problem? Situation 1 and 2 point to communication problems, but number 3
>> >involves the relay-domains file - it hasn't been sent over FTP ever -
>> >seems to say that it's a filesystem/hardware problem. Where should I
>> >start looking?
>> >Thanks,
>> > Ross Vandegrift
>> > Seitz Technical Products Inc
>> --
>> _\|/_
>> (o o)
>> ----------------------------------------------oOO-(_)-OOo------
>> Packet filtering for Linux
>> http://www.packetfilter.dynip.com/
>> Now hosting IPChains mailing list v2
>> "...Unix, MS-DOS, and Windows NT (also known as the Good,
>> the Bad, and the Ugly)." (By Matt Welsh)
>> ---------------------------------------------------------------
--
_\|/_
(o o)
----------------------------------------------oOO-(_)-OOo------
Packet filtering for Linux
http://www.packetfilter.dynip.com/
Now hosting IPChains mailing list v2
"...Unix, MS-DOS, and Windows NT (also known as the Good,
the Bad, and the Ugly)." (By Matt Welsh)
---------------------------------------------------------------