2x 120 GB IDE + NFS server = crash

2x 120 GB IDE + NFS server = crash

Post by Steve Schmi » Fri, 24 Jan 2003 03:29:44



Hello all,

here's a weird one:

Plug two 120 GB IDE disks on the same IDE controller, mount and export them via
NFS, have two (or more) clients access them heavily. Crashes withing below 15
minutes, 100% of the time.

Tested disks: 120 GB IDE by Maxtor and by Western Digital (two each), in
all (!) combinations, and a spare Maxtor just to be sure.

Tested IDE cables: 6 (six).

Tested NICs: 3COM 905 and Realtek 8139, in several different PCI slots.

Tested on two different boards (models!) which are proven to otherwise work
just fine.

Weird things:
- all disks work perfectly as long as they're alone on a controller.
- 2x 120 GB on one controller works... as long as you're reading them LOCALLY
  (as opposed to via NFS).
- 120 GB + 40 GB on one controller works, even over NFS.

Workaround: buy extra controller, plug each disk on its own one. So for four
disks, there's the two onboard controllers plus two on the extra PCI card.

My wild guess would be this must be a bug *very* deep in the NFS server; but
why does it work as soon as each disk has its own controller?

If someone here knows enough to explain this, I would really appreciate to
learn what the hey is going on here.

Cheers, Steve

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Jens Zahne » Fri, 24 Jan 2003 06:11:17




[cut of]

Quote:> Workaround: buy extra controller, plug each disk on its own one. So for
four
> disks, there's the two onboard controllers plus two on the extra PCI card.

> My wild guess would be this must be a bug *very* deep in the NFS server;
but
> why does it work as soon as each disk has its own controller?

I don't think that this is a bug in the nfs-server, what controller are you
using?
Have you tested it, by producing heavy load on both harddrives from the same
machine
(e.g. copying much data from 2 other local drives to your drives connected
to your controller).

Quote:> If someone here knows enough to explain this, I would really appreciate to
> learn what the hey is going on here.

Greetings
Jens

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Georg Ach » Fri, 24 Jan 2003 07:17:05




|> Hello all,
|>
|> here's a weird one:
|>
|> Plug two 120 GB IDE disks on the same IDE controller, mount and export them via
|> NFS, have two (or more) clients access them heavily. Crashes withing below 15
|> minutes, 100% of the time.

Which kernel/distribution? We've seen similar crashes with a (obvioulsy
pre-release) version of Suse 8.1 with 2.4.19. It got stuck during IDE accesses
(the LED was on), the last kernel messages (after killing klogd) were "hdx: lost
interrupt". A self compiled kernel fixed it.

--

         http://wwwbode.in.tum.de/~acher
         "Oh no, not again !" The bowl of petunias

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Steve Schmi » Sat, 25 Jan 2003 05:00:29



Quote:> what controller are you using?

The crashes were reproducible with both Gigabyte GA-7ZXE (VIA KT133A
chipset) and GA-7VAX boards (VIA KT333 chipset).

Quote:> Have you tested it, by producing heavy load on both harddrives from the same
> machine

Local tests consisted of two concurrent tasks, each reading ca. 70 GB
from a different disk. These tests always worked allright.

Remote tests consisted of two concurrent clients, each reading ca. 70
GB from a different disk over NFS. These tests always crashed the
machine.

Remote tests where both clients would read the same ond only disk on a
controller worked allright, too.

Cheers, Steve

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Steve Schmi » Sat, 25 Jan 2003 05:09:18



> Which kernel/distribution? We've seen similar crashes with a (obvioulsy
> pre-release) version of Suse 8.1 with 2.4.19. It got stuck during IDE accesses
> (the LED was on), the last kernel messages (after killing klogd) were "hdx:
> lost interrupt". A self compiled kernel fixed it.

SuSE 8.0 release kernel, i.e. 2.4.18 with all the SuSE patches (2.4.18-58).  We
also tried a self-compiled 2.4.19 from kernel.org, to no avail.

Also, we got no messages whatsoever, wherever. The machine would just freeze, and
often (though not always) the caps and scroll lock LEDs would blink 'n sync. We
tried with magic sysreq activated in the self-compiled 2.4.19 but to no avail.

Cheers, Steve

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Georg Ach » Sat, 25 Jan 2003 07:58:22




|> Also, we got no messages whatsoever, wherever. The machine would just freeze,
|> and often (though not always) the caps and scroll lock LEDs would blink 'n
|> sync. We  tried with magic sysreq activated in the self-compiled 2.4.19 but
|> to no avail.

Ok, if it blinks it's a good sign, that's a hidden Oops ;-) Try the following on
the console:

killall klogd
klogconsole -r 0 -l 9

And let it crash again. Then you should see the oops, write down the stack trace
and search for the nearest function addresses in /proc/ksyms.

(The problem with klogd is that it buffers the messages to the console, so in
case of a crash you may miss a fw of the last messages. By killing klogd, you get
the direct "realtime" output.)

--

         http://wwwbode.in.tum.de/~acher
         "Oh no, not again !" The bowl of petunias

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Steve Schmi » Sat, 25 Jan 2003 19:52:53


Quote:> Ok, if it blinks it's a good sign, that's a hidden Oops ;-)

Aha! I was wondering about this 'n sync blinking on other
occasions.

Quote:> Try the following on the console:

> killall klogd
> klogconsole -r 0 -l 9

> And let it crash again. Then you should see the oops, write down the
> stack trace and search for the nearest function addresses in
> /proc/ksyms.

Well I'm really glad to have it working now, and furthermore several
scientists will hunt me down and kill me if I crash this machine,
but thanks anyway, now I know how to examine this kind of behaviour
on future occasions.

BTW, do you know why are kernel Oops messages buffered anyway? I mean,
when the kernel goes Oops I have other things to worry about than an
unbuffered write to the console.

Cheers, Steve

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Georg Ach » Sat, 25 Jan 2003 22:56:19




|> > And let it crash again. Then you should see the oops, write down the
|> > stack trace and search for the nearest function addresses in
|> > /proc/ksyms.
|>
|> Well I'm really glad to have it working now, and furthermore several
|> scientists will hunt me down and kill me if I crash this machine,
|> but thanks anyway, now I know how to examine this kind of behaviour
|> on future occasions.

You have to cultivate your BOFH ;-)

|> BTW, do you know why are kernel Oops messages buffered anyway? I mean,
|> when the kernel goes Oops I have other things to worry about than an
|> unbuffered write to the console.

Without klogd the output is unbuffered, but only visible on the console. klogd
intercepts this "port" and can redirect it to file or to the syslog-deamon, filter
it, find symbols for oopses (but only for the harmelss non-"Aieee-scheduling in
interrupt"-ones) and so on. And since it is an extra process, it may not live
anymore after a hard oops...

--

         http://wwwbode.in.tum.de/~acher
         "Oh no, not again !" The bowl of petunias

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Mykroft Holmes I » Mon, 27 Jan 2003 13:53:13




>> what controller are you using?

> The crashes were reproducible with both Gigabyte GA-7ZXE (VIA KT133A
> chipset) and GA-7VAX boards (VIA KT333 chipset).

>> Have you tested it, by producing heavy load on both harddrives from the same
>> machine

> Local tests consisted of two concurrent tasks, each reading ca. 70 GB
> from a different disk. These tests always worked allright.

> Remote tests consisted of two concurrent clients, each reading ca. 70
> GB from a different disk over NFS. These tests always crashed the
> machine.

> Remote tests where both clients would read the same ond only disk on a
> controller worked allright, too.

> Cheers, Steve

Likely you've run into a combination of an IDE design flaw and nfsd being
picky. IDE doesn't handle two devices on the same channel well (In fact
it's truly bad when you hit 2 devices on the same channel hard). NFS is
very picky about disk access (Read some of the horror stories about what
happens in a lab with a lot of cross mounted shares when a NFS server
dies. the entire lab will lock up hard).

NFS also has a bad habit of taking out systems when it borks. Note that
FreeBSD's NFS implementation handles high loads better than the Linux
implementation (Unfortunately, FreeBSD's ATA driver doesn't seem to be
quite as good as the latest version of Linux's)

The solution is to either put the drives on individual
channels/controllers or go SCSI. SCSI's support for Tagged Command Queues
 and better overall design allows it to handle high loads to multiple
devices on the same bus where IDE falls down.

Adam

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Steve Schmi » Tue, 28 Jan 2003 19:55:54


Quote:> You have to cultivate your BOFH ;-)

"They may hate me, as long as they fear me." Caligula (Roman Emperor).

Cheers, Steve

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Steve Schmi » Tue, 28 Jan 2003 21:07:53


Quote:> Likely you've run into a combination of an IDE design flaw and nfsd being
> picky. IDE doesn't handle two devices on the same channel well (In fact
> it's truly bad when you hit 2 devices on the same channel hard).

We have this combination on several machines, none of which caused any
trouble yet (besides relatively bad performance of course). It's only
the 120+120 combination that has been causing us headaches. Why the
heck???

Quote:> NFS is very picky about disk access

I really don't understand how NFS comes to even know that the devices
are on the same controller. Doesn't NFS use the VFS layer to abstract
access for different filesystems?

Quote:> NFS also has a bad habit of taking out systems when it borks. Note that
> FreeBSD's NFS implementation handles high loads better than the Linux
> implementation

Well I never considered saturating one 100 Mbit link a high load. I
mean, that's 10 MB/s at best.

Quote:> The solution is to either put the drives on individual
> channels/controllers

Which is what we (the institute) did (described in initial post).

Quote:> or go SCSI.

Which is what we can't affort. 480 GB would cost us $3500 as SCSI (not
counting the controller) as opposed to $500 with IDE. That's a factor
of 7 (seven) in favour of IDE, and that's beyond good and evil!

Cheers, Steve

 
 
 

2x 120 GB IDE + NFS server = crash

Post by ERA » Wed, 29 Jan 2003 13:01:47


In comp.os.linux.networking


[...]

Quote:>> or go SCSI.

> Which is what we can't affort. 480 GB would cost us $3500 as SCSI

Hmmm, I calculate a hair over $2900 using three Fujitsu 147GB 10K RPM
Ultra320 drives. Which would get you to about 441GB and a whole heck
of a lot o' whoopie in throughput. ;-) Now if you go with the IBM
32P0725 drives, yeah that's pretty close to $35K ... actually more
like $36K+.

Quote:> (not counting the controller) as opposed to $500 with IDE. That's a
> factor of 7 (seven) in favour of IDE, and that's beyond good and
> evil!

I too was going to suggest SCSI after reading the thread until I saw
this. My answer to that statement is "you get that for which you pay".
Cost should *never* be the only "factor" in determining hardware for
a critical server. IMO, cost shouldn't even be a large part of the
decision if the server is truly crucial. Add in the robustness of SCSI
vs IDE and that cost differential becomes justifiable. However, if
this server in question is not critical then sure, skimp. :-D


SCO Group Authorized Partner - OpenServer, UnixWare & SCO Linux
--
Linux era1.eracc.UUCP 2.4.19-16mdk i686
  9:35pm  up 14 days, 20 min,  7 users,  load average: 0.18, 0.20, 0.24
ERA Computer Consulting http://eracc.hypermart.net/
eCS, OS/2, Linux, OpenServer, UnixWare, Mandrake & SCO Linux resellers

 
 
 

2x 120 GB IDE + NFS server = crash

Post by Steve Schmi » Wed, 29 Jan 2003 22:39:55


This really gets a bit off-topic, especially since I wrote in the
initial post that I already do have a workaround, and this whole
thread is actually about fixing/working around a real and lethal bug
in the Linux kernel, not a philosophical discussion about what disks
to buy with an infinite amount of money.

Quote:> > Which is what we can't affort. 480 GB would cost us $3500 as SCSI

> Hmmm, I calculate a hair over $2900 using three Fujitsu 147GB 10K RPM
> Ultra320 drives.

So it's a factor of 6 instead of 7. Doesn't really affect my point.

Quote:> Which would get you to about 441GB and a whole heck
> of a lot o' whoopie in throughput.

... which is accessed via a 100 Mbit link, as I wrote already. "A lot
o' whoopie in throughput" indeed :-)

Quote:> I too was going to suggest SCSI after reading the thread until I saw
> this. My answer to that statement is "you get that for which you pay".
> Cost should *never* be the only "factor" in determining hardware for
> a critical server. IMO, cost shouldn't even be a large part of the
> decision if the server is truly crucial.
> Add in the robustness of SCSI vs IDE and that cost differential becomes
> justifiable.

Since you bring up the issue of reliability:

The SCSI proposal isn't redundant at all. A disks fails and the data
is gone, has to be restored or re-generated, a lengthy process, lots
of downtime.

For the price of the SCSI solution, I can mirror each IDE disk five
times.

For the price of the SCSI solution, I can build a 8x160 GB RAID5 array
in IDE, including controller and spare disk, all hot-swappable.

Anyway, this is all a moot point, because...

Quote:> However, if this server in question is not critical then sure, skimp. :-D

In fact it's just a scratch dump, data safety was a minor issue in the
first place. However, a maximum uptime of 15 min *is* a major issue,
even for a scratch dump.
 
 
 

1. Unknown major/minor Devtype - IBM DeskStar 120 GB

I 'm trying to make an IBM DeskStar with 120 GB unter FreeBSD 4.6 runing.
Partitioning the harddrive with fdisk under /stand/sysinstall worked. But
when I tryed to craete a label it failed with the message: "Unknown
major/minor Devtype ".
The harddrive is noticed as
"ad0: 117800MB <IC35L120AVVA07-0> [239340/16/63] at ata0-master UDMA33" at
bootup.

Is it possible that the harddisk is not supported in the FreeBSD-release I
use?
How can i solve the problem?

Peter

2. Shadow Password: User Account Info

3. 120 Gb drive with Linux?

4. internet connectivity

5. Big Harddisk (Matrox, 120 GB)

6. Watchdog Timeout - Part 2

7. Installing Red Hat EL 3.0 on external USB drive ( Maxtor 300 LE, 120 GB )

8. more than one mouse

9. How to partition 120 GB drive for Solaris 9

10. Problems with HighPoint Rocket 100 and/or Seagate 120 GB hard drive

11. 120 GB HD

12. Can 120 GB HD work as slave?

13. Which to buy - Cyrix 120+/Pentium 120