dual processor machine keeps crashing

dual processor machine keeps crashing

Post by Jorg B » Sat, 21 Mar 1998 04:00:00



Hello,

I'm running a dual P2 - 233 MHZ processor Linux server with Slackware 3.4
(Kernel 2.0.33 complied for SMP).

Here are some specs about the server
2 -  233 MHZ Pentium 2 processors
Asus P2L97-DS Motherboard (USB and Power saving features turned off)
Using Scsi adapter on the board controlling 2 Seagate Barracuda ULTRA WIDE
drives
128 MB of DRAM
3com 905 Network Card (0.49 drivers)
Generic Trident svga card

After a period of time the server just crashes, this can be after one day
or after 14 days. When it crashes the screen turns black and the server
shows no sign of live, the Keyboard does not respond neither can you ping
the server. There are no indications on why the server crashes... (no
entries in the logs !!!).
The Power lights on the machine are still on when it crashes and the drives
are still spinning. The only way to get this machine back to live is by
turning it's power off and back on. I called Asus to find out if there may
be a problem with the settings but everything seemed to be fine, so they
told me to replace the board which I did but the machine keeps crashing.
And yes I do have the following statement in the Lilo.conf file mem=127
(total size of memory - 1MB). I even got the newest scsi drivers for the on
board 2940 adaptec. I'm running out of ideas...
Should I try a different kernel, lets say 2.1.90 ?
Is there much improvement in terms of SMP in the 2.1.xx kernels ?
Does anybody think this is caused by the kernel ?

Help...

Thanks
Jorg B.

Ps. We are running about 11 "single processor" linux servers with the same
hardware , except the board is different - none of these servers ever
crashes...

 
 
 

dual processor machine keeps crashing

Post by Henrik Carlqvis » Sun, 22 Mar 1998 04:00:00



> 3com 905 Network Card (0.49 drivers)

This is a guess, but could you try to replace the  network card with
another one? I have some bad experience of 3c900 myself which uses the
same driver. I have seen machines freeze, but it doesn't happen that
often and only at heavy netload, for example when doing rdist to to
other machines at the same time. I'm not sure if my problem is because
of the network card, but I think so.

regards Henrik

--
spammer strikeback:


Join LinuxNet RC5! Visit http://www.linuxnet.org for info

 
 
 

dual processor machine keeps crashing

Post by Mario Procopi » Mon, 23 Mar 1998 04:00:00


Hello!

Quote:>I'm running a dual P2 - 233 MHZ processor Linux server with Slackware 3.4
>(Kernel 2.0.33 complied for SMP).

>Here are some specs about the server
>2 -  233 MHZ Pentium 2 processors
>Asus P2L97-DS Motherboard (USB and Power saving features turned off)
>Using Scsi adapter on the board controlling 2 Seagate Barracuda ULTRA WIDE
>drives
>128 MB of DRAM
>3com 905 Network Card (0.49 drivers)
>Generic Trident svga card

>After a period of time the server just crashes, this can be after one day
>or after 14 days. When it crashes the screen turns black and the server

I noticed the same problem, with kernels 2.0.32 & 2.0.33. It seems to me
that the *real* problem lies in the 3com 905 card. Have you tried a 3com
10Mb card (3com 590) ? With this card everything works just fine ! Who knows
if the problem is in the driver or is in other hardware conflicts.

Ciao,
Mario.

 
 
 

dual processor machine keeps crashing

Post by Eric Crampto » Mon, 23 Mar 1998 04:00:00



> I noticed the same problem, with kernels 2.0.32 & 2.0.33. It seems to me
> that the *real* problem lies in the 3com 905 card. Have you tried a 3com
> 10Mb card (3com 590) ? With this card everything works just fine ! Who knows
> if the problem is in the driver or is in other hardware conflicts.

If it makes any difference, I have three Linux machines which all have
uptimes on the order of months which all use the 3c905TX 100mbs cards
(some are actually running 10 mb/s, one is actually 100 mb/s). They
all run different kernels, from 2.0.13 (?) I think, 2.0.32 and
2.0.33. They are all single processor machines. My point is that I
doubt it is the driver, unless it's some driver problem which only
shows up when the kernel is compiled for SMP.

I'll bet it's more likely a hardware conflict. Like Mario said, I'd
try swapping out the ethernet card...

Best regards,
--

Black holes are where God divided by zero.

 
 
 

dual processor machine keeps crashing

Post by Vid Strp » Wed, 25 Mar 1998 04:00:00


On Sun, 22 Mar 1998 14:41:52 +0100, Mario Procopio wrote in alt.linux (too much anyway):

Quote:>Hello!

>>I'm running a dual P2 - 233 MHZ processor Linux server with Slackware 3.4
>>(Kernel 2.0.33 complied for SMP).

>>Here are some specs about the server
>>2 -  233 MHZ Pentium 2 processors
>>Asus P2L97-DS Motherboard (USB and Power saving features turned off)
>>Using Scsi adapter on the board controlling 2 Seagate Barracuda ULTRA WIDE
>>drives
>>128 MB of DRAM

  ^^^

Maybe this? Did you add 'append mem=127M' to your /etc/lilo.conf?
Some machines use 128Mb, some 128Mb-384Kb, some the other values. 127Mb
is safe, and you don't loose much.

And, 2.1.x kernels have better SMP support. Consider upgrading, or wait
for 2.2.0, it shouldn't be too long.

--

Zagreb, Ivanicgradska 48.  tel. 385 01 227760, job: 6150830.

-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GS d- s: a- C++ UL++++$ P+ L++$ E--- W+++ N++ o K- w--- O M- V-- P S+++
P E++ Y+ PGP+ t !5 X- !R tv--- b++ !DI D++ G++ >e+++ hr- y+ z+
------END GEEK CODE BLOCK------

"Bugs are not going to inherit the earth. They own it now. So we might as
   well make peace with the landlord." -- T. Eisner (1989)

 
 
 

dual processor machine keeps crashing

Post by Wayne Hyd » Fri, 27 Mar 1998 04:00:00




> > 3com 905 Network Card (0.49 drivers)
> This is a guess, but could you try to replace the  network card with
> another one? I have some bad experience of 3c900 myself which uses the
> same driver. I have seen machines freeze, but it doesn't happen that
> often and only at heavy netload, for example when doing rdist to to
> other machines at the same time. I'm not sure if my problem is because
> of the network card, but I think so.

I just put Linux back on my SMP box after I got fed up with NT.  I have
a second machine running Win95 (for MSOffice, etc).  The two are
connected via a 4-port NETGEAR FE104 100Mb hub.  The machines have 3C905
NICs.  I'm doing IP Masquerading (the Linux box has the modems -- EQL is
next), but my main problem is with the speed of the 100Mb link.  I
haven't had any crashes on Linux yet, but I have not stressed the system
(aside from having the CPUs cracking rc5).

Put plainly, the network preformance sucks.  The hub has four LEDs (1%,
10%, 20%, >30%) for utilization, and I *never* get above 20%.  I've done
testing of transfers over ftp, SMB, etc.  I have also measured the
performance using a command-line Win32 app for reading files on the SMB
shares and done some tests using ftp.  3MB/s is about the max I have
seen.

Disk performance is not a problem as I am reading off of a 6-disk RAID0
array that can saturate the FW SCSI bus.  The disk on the Win95 box can
also spit out over 6MB/s sustained.  

One thing I have noticed is that ftp transfers to the Win95 from the
Linux box are *much* faster than the opposite.  (doing a 'put' from the
Win95 machine is painful)  The hub shows a burst of activity, then about
a second pause, then another burst, etc.  The transfer is sustained
while doing a 'get' on the Win95 machine.  

The SMP box is a dual PPro/233; the Win95 is a K6/233.  Now the question
is which box is the dog on net performance?  I'm running RH5 upgraded to
2.0.33.  My NT boxes at work have no problems cranking out over 7MB/s
(limited by the HD speed in the simple test I did).  My Win95 machines
are all on 10Mb, so I don't know whether Win95 can handle the load or
not.  

Thanks for any help.  

-Wayne

 
 
 

dual processor machine keeps crashing

Post by James Youngma » Sat, 28 Mar 1998 04:00:00




  >> > 3com 905 Network Card (0.49 drivers)

  >> This is a guess, but could you try to replace the  network card with
  >> another one? I have some bad experience of 3c900 myself which uses the
  >> same driver. I have seen machines freeze, but it doesn't happen that
  >> often and only at heavy netload, for example when doing rdist to to
  >> other machines at the same time. I'm not sure if my problem is because
  >> of the network card, but I think so.

  Wayne> I just put Linux back on my SMP box after I got fed up with NT.  I have
  Wayne> a second machine running Win95 (for MSOffice, etc).  The two are
  Wayne> connected via a 4-port NETGEAR FE104 100Mb hub.  The machines have 3C905
  Wayne> NICs.  I'm doing IP Masquerading (the Linux box has the modems -- EQL is
  Wayne> next), but my main problem is with the speed of the 100Mb link.  I
  Wayne> haven't had any crashes on Linux yet, but I have not stressed the system
  Wayne> (aside from having the CPUs cracking rc5).

  Wayne> Put plainly, the network preformance sucks.  The hub has four LEDs (1%,
  Wayne> 10%, 20%, >30%) for utilization, and I *never* get above 20%.  I've done
  Wayne> testing of transfers over ftp, SMB, etc.  I have also measured the
  Wayne> performance using a command-line Win32 app for reading files on the SMB
  Wayne> shares and done some tests using ftp.  3MB/s is about the max I have
  Wayne> seen.

  Wayne> Disk performance is not a problem as I am reading off of a 6-disk RAID0
  Wayne> array that can saturate the FW SCSI bus.  The disk on the Win95 box can
  Wayne> also spit out over 6MB/s sustained.  

  Wayne> One thing I have noticed is that ftp transfers to the Win95 from the
  Wayne> Linux box are *much* faster than the opposite.  (doing a 'put' from the
  Wayne> Win95 machine is painful)  The hub shows a burst of activity, then about
  Wayne> a second pause, then another burst, etc.  The transfer is sustained
  Wayne> while doing a 'get' on the Win95 machine.  

  Wayne> The SMP box is a dual PPro/233; the Win95 is a K6/233.  Now the question
  Wayne> is which box is the dog on net performance?  I'm running RH5 upgraded to
  Wayne> 2.0.33.  My NT boxes at work have no problems cranking out over 7MB/s
  Wayne> (limited by the HD speed in the simple test I did).  My Win95 machines
  Wayne> are all on 10Mb, so I don't know whether Win95 can handle the load or
  Wayne> not.  

  Wayne> Thanks for any help.  

  Wayne> -Wayne

The attributions in the above don't indicat who said

  >> > 3com 905 Network Card (0.49 drivers)

but my 3c905 performance dramatically improved when I upgraded to that
version; previously I had similar problems to those you describe, but
from what I could see initially the problem appeared not to be the
Linux machine's fault since I could see from tcpdump that the Win95
box was pausing a long time between packets.

In fact this was not the case; the Linux box's 3c095 card/driver had
been dropping the packets so tcpdump never saw them...

 
 
 

dual processor machine keeps crashing

Post by Wayne Hyd » Sun, 29 Mar 1998 04:00:00



> The attributions in the above don't indicat who said
>   >> > 3com 905 Network Card (0.49 drivers)
> but my 3c905 performance dramatically improved when I upgraded to that
> version; previously I had similar problems to those you describe, but
> from what I could see initially the problem appeared not to be the
> Linux machine's fault since I could see from tcpdump that the Win95
> box was pausing a long time between packets.
> In fact this was not the case; the Linux box's 3c095 card/driver had
> been dropping the packets so tcpdump never saw them...

I changed from two 905's to two 595's and am still having the same
problem.  The max transfer rate I've gotten is about 3.6MB/s between the
two machines (Linux and Win95).  I can easily get twice that rate at
work between two NT machines with 3C905's; even then the speed was
limited by the machine's hard-disk speed.  I also installed Linux RH5 at
work and tested between Linux and NT -- same performance problem.  

Is there anyone here using 3Com NICs that can get 7+ MB/s on fast
ethernet with Linux?  

-Wayne

 
 
 

dual processor machine keeps crashing

Post by The Thought Assassi » Mon, 30 Mar 1998 04:00:00



> Hello!
> >I'm running a dual P2 - 233 MHZ processor Linux server with Slackware 3.4
> >(Kernel 2.0.33 complied for SMP).
> >2 -  233 MHZ Pentium 2 processors
> >3com 905 Network Card (0.49 drivers)
> >After a period of time the server just crashes, this can be after one day
> >or after 14 days. When it crashes the screen turns black and the server
> I noticed the same problem, with kernels 2.0.32 & 2.0.33. It seems to me
> that the *real* problem lies in the 3com 905 card. Have you tried a 3com
> 10Mb card (3com 590) ? With this card everything works just fine ! Who knows
> if the problem is in the driver or is in other hardware conflicts.

The driver for this card is known not to be SMP-safe. I believe there are
newer drivers, though, so you might try those. I would be using the most
stable 2.1 kernel you can find for SMP, though. It is just as stable and a
whole lot faster for multiprocessor operation.
Also, now that I think of it, there was a particular SCSI card that had
similar problems under SMP, so perhaps the original author is having
difficulties with that? Or it might just be a hardware problem.

-Greg Mildenhall

 
 
 

dual processor machine keeps crashing

Post by Peter Brule » Mon, 30 Mar 1998 04:00:00




> > The attributions in the above don't indicat who said

> >   >> > 3com 905 Network Card (0.49 drivers)

> > but my 3c905 performance dramatically improved when I upgraded to that
> > version; previously I had similar problems to those you describe, but
> > from what I could see initially the problem appeared not to be the
> > Linux machine's fault since I could see from tcpdump that the Win95
> > box was pausing a long time between packets.

> > In fact this was not the case; the Linux box's 3c095 card/driver had
> > been dropping the packets so tcpdump never saw them...

> I changed from two 905's to two 595's and am still having the same
> problem.  The max transfer rate I've gotten is about 3.6MB/s between the
> two machines (Linux and Win95).  I can easily get twice that rate at
> work between two NT machines with 3C905's; even then the speed was
> limited by the machine's hard-disk speed.  I also installed Linux RH5 at
> work and tested between Linux and NT -- same performance problem.

> Is there anyone here using 3Com NICs that can get 7+ MB/s on fast
> ethernet with Linux?

> -Wayne

  Probably W95 is the bottle neck
 
 
 

dual processor machine keeps crashing

Post by Wayne Hyd » Mon, 30 Mar 1998 04:00:00




> > I changed from two 905's to two 595's and am still having the same
> > problem.  The max transfer rate I've gotten is about 3.6MB/s between the
> > two machines (Linux and Win95).  I can easily get twice that rate at
> > work between two NT machines with 3C905's; even then the speed was
> > limited by the machine's hard-disk speed.  I also installed Linux RH5 at
> > work and tested between Linux and NT -- same performance problem.
> > Is there anyone here using 3Com NICs that can get 7+ MB/s on fast
> > ethernet with Linux?
>   Probably W95 is the bottle neck

No, it is not.  I loaded Win95 on the Linux box to do some testing.  I
was able to get over 6MB/s between the two Win95 boxes.  I did see a
bunch of collisions on the hub (when doing win95->win95), so I may just
try the two machines with a crossover cable and see if there is a
cable/hub problem.  But even with the high collisions, Win95->Win95 was
still faster than Linux->Win95.  

Does anyone here get good performance on 100Mb ethernet with Linux?  

(that is, over 6 MB/s)

-Wayne

 
 
 

dual processor machine keeps crashing

Post by Wayne Hyd » Mon, 30 Mar 1998 04:00:00




> : Does anyone here get good performance on 100Mb ethernet with Linux?
> can you get hold of the dec tulip card (2 of them) and try that, back-to-back,
> with a xover cable?  I am under the impression that this is the fastest
> 10/100 card out there.  locally, I found these cards for $29 (!), so
> its  easy/cheap to experiment with.  it would give you another datapoint,
> at least.

I only have access to 3Com 10/100 cards. (595 and 905)  I'd have to buy
two tulips, which I really would like to avoid since I might have the
same results and be out $60.

Quote:> I have 2 machines with this card installed (a p5/200 and a k6/200).  I put
> linux on both of them and copied a cdrom image (650meg or so) from hard disk
> to hard disk.  I think I got about 5.5MegaBytes/sec (via ncftp's report).

Not bad, but not great.  Is your disk subsystem fast enough to spit out
more than 5.5MB/s?  At work I get over 7MB/s between my workstation and
one of my servers, but the RAID1 array could only push about 7MB/s max.
I was also testing uncached performance.

Quote:> when going between the same machines, using linux and w95, the copy was MUCH
> slower.  so slow, that I terminated the copy after 5 minutes or so - the
> ncftp progress meter showed so little progress, I didn't want to wait around
> ;-)  there must have been some timing/race conditions between the two
> implementations (drivers for the tulip chip).  (the reason I needed to copy
> this data was because I built an iso9660 image on linux, but I like to burn
> cd-r's on my wintel machine (the mastering software on wintel is better
> (sorry) than what I have for linux, and my writer is attached to the wintel
> box).

Were you copying from the Win95 to the Linux box?  I've noticed (since I
have a HUB) that going from win95->Linux is painful.  There is a burst
of data, a long pause, and then another burst of data.  Other times it
just dribbles out data (low utilization -- 8Mb/s).  In my testing, going
from Linux to Win95 (using ftp on the Win95 box) is much faster than the
opposite.  

Quote:> anyway, consider the tulip cards.  they're fast, cheap, and the driver is
> pretty stable (under linux, at least).

I'd like to avoid switching out nics if possible.  Someone told me that
using a crossover cable on 100Mb is not a good idea (out of spec).  Is
this true?  I might grab a cable from work and test it out.  I also need
to check if my HUB can do full-duplex (netgear FE104).  Right now I'm at
half-duplex on both cards.  

-Wayne