It was running stable with 3 internal disks up until I added a HighPoint
controller and 5 more disks. Then I started getting problems, so you
could say it started from scratch.
I have fans all around the disks, so heat shouldn't be a problem.
My current setup is as follows:
3 drives on ide0+1 in software raid0
2 drives on onboard hpt controller and 5 drives on plugin hpt controller
in one LVM volume.
When it crashed last night I did the following when I rebooted it. I did
hdparm -X66 on all drives so they run UDMA2 instead of UDMA5. I also
used setpci to make the devices do more bursts.
Seems the machine has survived the night and has been up a total 24
hours as I write this. If this solution works I would say the problem
was with the disks loading the pci bus too much.
If this is a fault of my Abit BX-133 raid mainboard or with Linux I
really can't say since I don't know enough about the problem.
But..... disks might not be as warm when running UDMA2(?)
> Mikael Svenson shorted the keyboard with drool and:
> > I have tried with both 2.4.19ac4 and the latest 2.4.20 kernels, and in
> > both cases after the machine has been running for about 6-12 hours
> > before it hangs. The machine has high loads on the network 24/7.
> Has this machine run stable previously? Was it up for months then suddenly
> went unstable? Or have you just built it, and this is the smoke test?
> > These are the messages I've gotten with different configurations on the
> > pci slot and kernel. I have also tried using an eepro100 nic instead.
> > eth0: IRQ 5 is physically blocked!
> So does it say that no matter what NIC you use? Have you looked at BIOS
> > ide_dmaproc: chipset supported ide_dma_lostirq func only
> > hdd: lost interrtup
> > -
> > hdh:dma_timer_expiry: dma status == 0x44
> > hdh: lost interrupt
> Yikes! Ouch! Have you tried tweaking things with hdparm? Is this a stupid
> machine that only works properly with ACPI compiled in [like my Sony
> lacktop]? What do the devices hdd and hdh have in common? [apart from being
> IDE drives...] like are they both next to a hot piece of hardware, or do
> they share a power connector?
> > So clearly there is some IRQ/DMA problem somewhere which only show up
> > when the machine is being loaded high in disk/network traffic.
> But you said the network was heavily loaded all the time, so I guess that
> means it's never working ;-)
> > Any pointers on fixing the matter is appreciated. I can provide
> > pci/interrupt info if neccessary.
> Seeing as it only shows up when heavily loaded, could it be a heat problem
> inside the case? Maybe a thermostatic fan comes on and draws too much power
> for the system? You could try underclocking it...
> Knives and guns are dangerous,
> They don't want to play with us