dual xeon problems

dual xeon problems

Post by ekk » Sat, 06 Jan 2001 23:26:05



Hello,
I sent a message a little while ago about my dual xeon 550*
unpredictably.  I have pretty much ruled out the cpu temperature as the
source of the problem.  It must be some hardware issue.  Could you
please review the hardware I have below and let me know if you know of
any potential flaky hardware:

Motherboard: Super S2DGU/DGE running Bios R1.6 (based on intel 440GX)

ekkguest1:~ 101%  hinv
Total Processors: 2
vendor_id       : GenuineIntel
model name      : Pentium III (Katmai)
cpu MHz         : 551.258
cache size      : 512 KB
vendor_id       : GenuineIntel
model name      : Pentium III (Katmai)
cpu MHz         : 551.258
cache size      : 512 KB

Main Memory Size: 960 MB

Host: ide1 Channel: hdc
  Vendor:   CD-ROM Model: 50X

scsi0: Adaptec AIC7xxx driver version: 5.1.28/3.2.4
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: IBM      Model: DMVS18V          Rev: 0250

Keyboard Detected: 1
Serial Ports: 2
Parallel Ports: 0
Ethernet Controllers: eth0

IDE interface: Intel 82371AB PIIX4 IDE (rev 1).
USB Controller: Intel 82371AB PIIX4 USB (rev 1).
SCSI storage controller: Adaptec AIC-7890/1 (rev 1).
Ethernet controller: Intel 82557 (rev 8).
VGA compatible controller: ATI Unknown device (rev 0).

ekkguest1:~ 102%  cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 7
model name      : Pentium III (Katmai)
stepping        : 3
cpu MHz         : 551.258
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr xmm
bogomips        : 1101.00

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 7
model name      : Pentium III (Katmai)
stepping        : 3
cpu MHz         : 551.258
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr xmm
bogomips        : 1101.00

ekkguest1:~ 104%  cat /proc/interrupts
           CPU0       CPU1
  0:     360195          0          XT-PIC  timer
  1:        387          0          XT-PIC  keyboard
  2:          0          0          XT-PIC  cascade
  8:          1          0          XT-PIC  rtc
  9:        829          0          XT-PIC  eth0
 10:      36047          0          XT-PIC  aic7xxx
 12:         78          0          XT-PIC  PS/2 Mouse
 13:          1          0          XT-PIC  fpu
 15:          7          0          XT-PIC  ide1
NMI:          0
ERR:          0

please help me-

 
 
 

dual xeon problems

Post by jw » Sun, 07 Jan 2001 04:12:17


On Fri, 05 Jan 2001 09:26:05 -0500, ekk

>Hello,
>I sent a message a little while ago about my dual xeon 550*
>unpredictably.  I have pretty much ruled out the cpu temperature as the
>source of the problem.  It must be some hardware issue.  Could you
>please review the hardware I have below and let me know if you know of
>any potential flaky hardware:

>Main Memory Size: 960 MB

is that ECC? Tried memtest to test it? (see www.freshmeat.net and search
for memtest).

Good luck,
Jurriaan

--
Once I lay hands on you, the issue is closed. Prepare to learn the full extent
of the infinite hereafter.
      Jack Vance - Lyonesse II - The Green Pearl
GNU/Linux 2.2.19pre6 SMP 2x999 bogomips load av: 0.00 0.02 0.00

 
 
 

dual xeon problems

Post by ekk » Tue, 09 Jan 2001 04:51:17


I apologize, but is there a universal way to determine if the memory is ECC?
I know that there are 2 512 MB SDRAM Toshiba PC100-222-620R modules, but I can't
find anything on the web about this memory being ECC.  I am trying memtest right
now.
What am I looking for with memtest?  Will the misc001 routine crash if it's bad
memory?
Ken

> On Fri, 05 Jan 2001 09:26:05 -0500, ekk

> >Hello,
> >I sent a message a little while ago about my dual xeon 550*
> >unpredictably.  I have pretty much ruled out the cpu temperature as the
> >source of the problem.  It must be some hardware issue.  Could you
> >please review the hardware I have below and let me know if you know of
> >any potential flaky hardware:

> >Main Memory Size: 960 MB

> is that ECC? Tried memtest to test it? (see www.freshmeat.net and search
> for memtest).

> Good luck,
> Jurriaan

> --
> Once I lay hands on you, the issue is closed. Prepare to learn the full extent
> of the infinite hereafter.
>       Jack Vance - Lyonesse II - The Green Pearl
> GNU/Linux 2.2.19pre6 SMP 2x999 bogomips load av: 0.00 0.02 0.00

 
 
 

dual xeon problems

Post by jw » Tue, 09 Jan 2001 20:23:39


On Sun, 07 Jan 2001 14:51:17 -0500, ekk

>I apologize, but is there a universal way to determine if the memory is ECC?
>I know that there are 2 512 MB SDRAM Toshiba PC100-222-620R modules, but I can't
>find anything on the web about this memory being ECC.  I am trying memtest right
>now.
>What am I looking for with memtest?  Will the misc001 routine crash if it's bad
>memory?
>Ken

Well, let's put it this way:

if memtest doesn't crash - you can be sure of nothing.
if it does crash - you can be sure your memory isn't good.

As for some way to determine if it's ecc memory, the only way I know is
to get a program called ctspd that reads the spd-eprom on the dimms, if
that exists. I have heard that sisoft sandra is also supposed to do
this. Both are windows programs :-(

Good luck,
Jurriaan
--
What if Boeing were to copy Microsoft's habbits?
'What do you think, Ed, is our new 787 ready?' 'I haven't heard any
reports from our beta-testers last month about unexpected crashes,
so go ahead..'
        Seen on Usenet

GNU/Linux 2.2.19pre6 SMP 2x999 bogomips load av: 0.01 0.04 0.00

 
 
 

dual xeon problems

Post by ekk » Wed, 10 Jan 2001 03:59:11


Thanks for your help-
Memtest doesn't seem to crash, although the machine does hang before some of the
programs finish running - just like it always does when I have intensive processes
running.  But, memtest itself does not crash.
Ken

> On Sun, 07 Jan 2001 14:51:17 -0500, ekk

> >I apologize, but is there a universal way to determine if the memory is ECC?
> >I know that there are 2 512 MB SDRAM Toshiba PC100-222-620R modules, but I can't
> >find anything on the web about this memory being ECC.  I am trying memtest right
> >now.
> >What am I looking for with memtest?  Will the misc001 routine crash if it's bad
> >memory?
> >Ken

> Well, let's put it this way:

> if memtest doesn't crash - you can be sure of nothing.
> if it does crash - you can be sure your memory isn't good.

> As for some way to determine if it's ecc memory, the only way I know is
> to get a program called ctspd that reads the spd-eprom on the dimms, if
> that exists. I have heard that sisoft sandra is also supposed to do
> this. Both are windows programs :-(

> Good luck,
> Jurriaan
> --
> What if Boeing were to copy Microsoft's habbits?
> 'What do you think, Ed, is our new 787 ready?' 'I haven't heard any
> reports from our beta-testers last month about unexpected crashes,
> so go ahead..'
>         Seen on Usenet

> GNU/Linux 2.2.19pre6 SMP 2x999 bogomips load av: 0.01 0.04 0.00

 
 
 

dual xeon problems

Post by jw » Wed, 10 Jan 2001 06:24:50


On Mon, 08 Jan 2001 13:59:11 -0500, ekk


>Thanks for your help-
>Memtest doesn't seem to crash, although the machine does hang before some of the
>programs finish running - just like it always does when I have intensive processes
>running.  But, memtest itself does not crash.
>Ken

>> On Sun, 07 Jan 2001 14:51:17 -0500, ekk

>> >I apologize, but is there a universal way to determine if the memory is ECC?
>> >I know that there are 2 512 MB SDRAM Toshiba PC100-222-620R modules, but I can't
>> >find anything on the web about this memory being ECC.  I am trying memtest right
>> >now.

will it run with one module? then you should test them one after the
other. Otherwise, is returning it to the shop an option? To each his own
:-)

Jurriaan

--
"Bother!" said Pooh as Q destroyed the universe.
GNU/Linux 2.2.19pre6 SMP 2x1402 bogomips load av: 0.05 0.04 0.02

 
 
 

dual xeon problems

Post by ekk » Fri, 12 Jan 2001 03:38:40


Hello,
Thank you again.  Actually, it seems to do okay with only one module.  I haven't tested
the other one yet, I'm still pushing the first one hard to see if it will crash.
A while ago, you had asked if the RAM was ECC or not - I just found out that it is.
What ramifications does that have?
Another odd thing I noticed when I took out one of the memory modules - when both of
the memory modules were in there, the BIOS would check the memory 3 times, but with
only one modules it only checks it once.  What does that mean.

Thank you
Ken


> On Mon, 08 Jan 2001 13:59:11 -0500, ekk

> >Thanks for your help-
> >Memtest doesn't seem to crash, although the machine does hang before some of the
> >programs finish running - just like it always does when I have intensive processes
> >running.  But, memtest itself does not crash.
> >Ken

> >> On Sun, 07 Jan 2001 14:51:17 -0500, ekk

> >> >I apologize, but is there a universal way to determine if the memory is ECC?
> >> >I know that there are 2 512 MB SDRAM Toshiba PC100-222-620R modules, but I can't
> >> >find anything on the web about this memory being ECC.  I am trying memtest right
> >> >now.

> will it run with one module? then you should test them one after the
> other. Otherwise, is returning it to the shop an option? To each his own
> :-)

> Jurriaan

> --
> "Bother!" said Pooh as Q destroyed the universe.
> GNU/Linux 2.2.19pre6 SMP 2x1402 bogomips load av: 0.05 0.04 0.02

 
 
 

dual xeon problems

Post by ekk » Fri, 12 Jan 2001 03:41:26


Ah, I take it back - it just crashed, almost as soon as my last message was sent.  I
will test the other module now-
Ken

> On Mon, 08 Jan 2001 13:59:11 -0500, ekk

> >Thanks for your help-
> >Memtest doesn't seem to crash, although the machine does hang before some of the
> >programs finish running - just like it always does when I have intensive processes
> >running.  But, memtest itself does not crash.
> >Ken

> >> On Sun, 07 Jan 2001 14:51:17 -0500, ekk

> >> >I apologize, but is there a universal way to determine if the memory is ECC?
> >> >I know that there are 2 512 MB SDRAM Toshiba PC100-222-620R modules, but I can't
> >> >find anything on the web about this memory being ECC.  I am trying memtest right
> >> >now.

> will it run with one module? then you should test them one after the
> other. Otherwise, is returning it to the shop an option? To each his own
> :-)

> Jurriaan

> --
> "Bother!" said Pooh as Q destroyed the universe.
> GNU/Linux 2.2.19pre6 SMP 2x1402 bogomips load av: 0.05 0.04 0.02

 
 
 

dual xeon problems

Post by jw » Fri, 12 Jan 2001 05:38:59


On Wed, 10 Jan 2001 13:38:40 -0500, ekk


>Hello,
>Thank you again.  Actually, it seems to do okay with only one module.  I haven't tested
>the other one yet, I'm still pushing the first one hard to see if it will crash.
>A while ago, you had asked if the RAM was ECC or not - I just found out that it is.
>What ramifications does that have?

Basically, ECC allows you to survive one-bit errors and detect (by
crashing deliberately) two-bit errors. Normal memory may or may not
crash, depending on what memory locations randomly change. So by using
ECC you should be more sure of what happens.

Quote:>Another odd thing I noticed when I took out one of the memory modules - when both of
>the memory modules were in there, the BIOS would check the memory 3 times, but with
>only one modules it only checks it once.  What does that mean.

No idea. I'm just an amateur here, remember :-)

since I read below that it also crashed with one module, it's starting
to look more and more (if it crashes with the other one too) like you
have a 'monday-morning' product with random glitches - I do hope you
have warranty....

Good luck,
Jurriaan
--
BOFH excuse #437:

crop circles in the corn shell
GNU/Linux 2.2.19pre7 SMP 2x1402 bogomips load av: 0.04 0.02 0.00

 
 
 

dual xeon problems

Post by Alex Deuche » Thu, 18 Jan 2001 05:17:32


Check the kernel mailing list archives.  There was a bug in some of the
test kernels that caused xeons to hang or run very slowly.  I can't
remember what it was, maybe something with mtrr...

Alex


> Hello,
> I sent a message a little while ago about my dual xeon 550*
> unpredictably.  I have pretty much ruled out the cpu temperature as the
> source of the problem.  It must be some hardware issue.  Could you
> please review the hardware I have below and let me know if you know of
> any potential flaky hardware:

> Motherboard: Super S2DGU/DGE running Bios R1.6 (based on intel 440GX)

> ekkguest1:~ 101%  hinv
> Total Processors: 2
> vendor_id       : GenuineIntel
> model name      : Pentium III (Katmai)
> cpu MHz         : 551.258
> cache size      : 512 KB
> vendor_id       : GenuineIntel
> model name      : Pentium III (Katmai)
> cpu MHz         : 551.258
> cache size      : 512 KB

> Main Memory Size: 960 MB

> Host: ide1 Channel: hdc
>   Vendor:   CD-ROM Model: 50X

> scsi0: Adaptec AIC7xxx driver version: 5.1.28/3.2.4
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
>   Vendor: IBM      Model: DMVS18V          Rev: 0250

> Keyboard Detected: 1
> Serial Ports: 2
> Parallel Ports: 0
> Ethernet Controllers: eth0

> IDE interface: Intel 82371AB PIIX4 IDE (rev 1).
> USB Controller: Intel 82371AB PIIX4 USB (rev 1).
> SCSI storage controller: Adaptec AIC-7890/1 (rev 1).
> Ethernet controller: Intel 82557 (rev 8).
> VGA compatible controller: ATI Unknown device (rev 0).

> ekkguest1:~ 102%  cat /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 7
> model name      : Pentium III (Katmai)
> stepping        : 3
> cpu MHz         : 551.258
> cache size      : 512 KB
> fdiv_bug        : no
> hlt_bug         : no
> sep_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 2
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 mmx fxsr xmm
> bogomips        : 1101.00

> processor       : 1
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 7
> model name      : Pentium III (Katmai)
> stepping        : 3
> cpu MHz         : 551.258
> cache size      : 512 KB
> fdiv_bug        : no
> hlt_bug         : no
> sep_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 2
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 mmx fxsr xmm
> bogomips        : 1101.00

> ekkguest1:~ 104%  cat /proc/interrupts
>            CPU0       CPU1
>   0:     360195          0          XT-PIC  timer
>   1:        387          0          XT-PIC  keyboard
>   2:          0          0          XT-PIC  cascade
>   8:          1          0          XT-PIC  rtc
>   9:        829          0          XT-PIC  eth0
>  10:      36047          0          XT-PIC  aic7xxx
>  12:         78          0          XT-PIC  PS/2 Mouse
>  13:          1          0          XT-PIC  fpu
>  15:          7          0          XT-PIC  ide1
> NMI:          0
> ERR:          0

> please help me-