ECC vs parity: Gigabyte 686DX

ECC vs parity: Gigabyte 686DX

Post by Dan Ts' » Fri, 18 Jul 1997 04:00:00



        I recently bought a Gigabyte 686DX system with dual PPro 200's and
2 * 32Mb parity memory in hopes of getting ECC memory implemented.
        The systems runs "fine" with parity disabled, which was the way
the system was delivered.
        With either parity enabled or ECC enabled, the system traps out with
a "parity error" any where from 5 minutes to several hours of run time.

        1) Can I assume that the parity system is "working" and that it is
probably a bad SIMM ? If not, how can I tell what the problem is ?
        It is scarey to think that with parity disabled, I could not tell
if there was any problem and went about installing lots of software. It would
seem that potentially bits could be wrong here and there, everywhere.

        2) Under Windows 95, what is the practical difference between ECC and
parity ? How can I tell if the ECC circuitry has successfully corrected a
single bit error ? Does the BIOS/OS trap out with an ECC correction or just
with a greater than single bit error detection ?
        Obviously if Windows 95/BIOS is going to trap out even with an ECC
single bit detect->correct, then it is pointless to have ECC over parity.

        3) What is the situation wrt ECC handling in other OS's: NT, FreeBSD,
Linux ?

--
                        Cheers,
                        Dan Ts'o                        212-327-7671
                        Dept. of Neurobiology           FAX: 212-327-7671
                        The Rockefeller University


 
 
 

ECC vs parity: Gigabyte 686DX

Post by Matt Dill » Fri, 18 Jul 1997 04:00:00




:>   I recently bought a Gigabyte 686DX system with dual PPro 200's and
:>2 * 32Mb parity memory in hopes of getting ECC memory implemented.
:>   The systems runs "fine" with parity disabled, which was the way
:>the system was delivered.
:>   With either parity enabled or ECC enabled, the system traps out with
:>a "parity error" any where from 5 minutes to several hours of run time.
:>
:>   1) Can I assume that the parity system is "working" and that it is
:>...
:>
:>   2) Under Windows 95, what is the practical difference between ECC and
:>parity ? How can I tell if the ECC circuitry has successfully corrected a
:>...
:>   Obviously if Windows 95/BIOS is going to trap out even with an ECC
:>single bit detect->correct, then it is pointless to have ECC over parity.
:>
:>   3) What is the situation wrt ECC handling in other OS's: NT, FreeBSD,
:>Linux ?
:>
:>--
:>                   Cheers,
:>                   Dan Ts'o                        212-327-7671

    Both ECC and Parity modes work fine on PPro 200's / Natoma or later
    chipsets.. we run all of our FreeBSD boxes on PPro's with ECC turned on.
    We use single-cpu ASUS motherboards but I wouldn't expect there to be
    any problems on duel-ppro boards.

    Make sure your memory is actually parity memory and not some of the
    'fake parity memory' that's been floating around.  Fake parity memory
    fakes the parity bit by using a parity generator rather then a dram
    for the parity bits.  It's the stupidest thing I've ever seen, but
    apparently there are a lot of these floating around.

    Also make sure that the memory speed settings in the bios match the
    speed of the memory you purchased (usually 60 or 70ns).

    I do not believe correctable bit errors generate an interrupt, but I
    could be wrong.

                                                        -Matt

 
 
 

ECC vs parity: Gigabyte 686DX

Post by Donovan Read » Fri, 18 Jul 1997 04:00:00





> :>      I recently bought a Gigabyte 686DX system with dual PPro 200's and
> :>2 * 32Mb parity memory in hopes of getting ECC memory implemented.
> :>      The systems runs "fine" with parity disabled, which was the way
> :>the system was delivered.
> :>      With either parity enabled or ECC enabled, the system traps out with
> :>a "parity error" any where from 5 minutes to several hours of run time.
> :>
> :>      1) Can I assume that the parity system is "working" and that it is
> :>...
> :>
> :>      2) Under Windows 95, what is the practical difference between ECC and
> :>parity ? How can I tell if the ECC circuitry has successfully corrected a
> :>...
> :>      Obviously if Windows 95/BIOS is going to trap out even with an ECC
> :>single bit detect->correct, then it is pointless to have ECC over parity.
> :>
> :>      3) What is the situation wrt ECC handling in other OS's: NT, FreeBSD,
> :>Linux ?
> :>
> :>--
> :>                      Cheers,
> :>                      Dan Ts'o                        212-327-7671

>     Both ECC and Parity modes work fine on PPro 200's / Natoma or later
>     chipsets.. we run all of our FreeBSD boxes on PPro's with ECC turned on.
>     We use single-cpu ASUS motherboards but I wouldn't expect there to be
>     any problems on duel-ppro boards.

>     Make sure your memory is actually parity memory and not some of the
>     'fake parity memory' that's been floating around.  Fake parity memory
>     fakes the parity bit by using a parity generator rather then a dram
>     for the parity bits.  It's the stupidest thing I've ever seen, but
>     apparently there are a lot of these floating around.

>     Also make sure that the memory speed settings in the bios match the
>     speed of the memory you purchased (usually 60 or 70ns).

>     I do not believe correctable bit errors generate an interrupt, but I
>     could be wrong.

>                                                         -Matt

I think you are correct. A two bit error should generate an NMI.

We are an Asus dealer, and there was a BIOS beta (since corrected) that
acted similarly to the Gigabyte problem reported above.

As far as I know, the ECC is done well before the OS knows anything has
happened. This is hardware, not OS dependent.
--

Donovan Ready,
Lindsay Computer Systems
http://www.jumpnet.com/~lcs

 
 
 

ECC vs parity: Gigabyte 686DX

Post by Louis Epste » Sat, 19 Jul 1997 04:00:00


: >


: > :>      I recently bought a Gigabyte 686DX system with dual PPro 200's and
: > :>2 * 32Mb parity memory in hopes of getting ECC memory implemented.
: > :>      The systems runs "fine" with parity disabled, which was the way
: > :>the system was delivered.
: > :>      With either parity enabled or ECC enabled, the system traps out with
: > :>a "parity error" any where from 5 minutes to several hours of run time.
: > :>
: > :>      1) Can I assume that the parity system is "working" and that it is
: > :>...
: > :>
: > :>      2) Under Windows 95, what is the practical difference between ECC and
: > :>parity ? How can I tell if the ECC circuitry has successfully corrected a
: > :>...
: > :>      Obviously if Windows 95/BIOS is going to trap out even with an ECC
: > :>single bit detect->correct, then it is pointless to have ECC over parity.
: > :>
: > :>      3) What is the situation wrt ECC handling in other OS's: NT, FreeBSD,
: > :>Linux ?
: > :>
: > :>--
: > :>                      Cheers,
: > :>                      Dan Ts'o                        212-327-7671
: >
: >     Both ECC and Parity modes work fine on PPro 200's / Natoma or later
: >     chipsets.. we run all of our FreeBSD boxes on PPro's with ECC turned on.
: >     We use single-cpu ASUS motherboards but I wouldn't expect there to be
: >     any problems on duel-ppro boards.

Don't forget,since Dan might not know this,that dual-processor
boards will only work with FreeBSD 3.0,not 2.x.

: >     Make sure your memory is actually parity memory and not some of the
: >     'fake parity memory' that's been floating around.  Fake parity memory
: >     fakes the parity bit by using a parity generator rather then a dram
: >     for the parity bits.  It's the stupidest thing I've ever seen, but
: >     apparently there are a lot of these floating around.
: >
: >     Also make sure that the memory speed settings in the bios match the
: >     speed of the memory you purchased (usually 60 or 70ns).
: >
: >     I do not believe correctable bit errors generate an interrupt, but I
: >     could be wrong.
: >
: >                                                         -Matt
:
: I think you are correct. A two bit error should generate an NMI.
:
: We are an Asus dealer, and there was a BIOS beta (since corrected) that
: acted similarly to the Gigabyte problem reported above.
:
: As far as I know, the ECC is done well before the OS knows anything has
: happened. This is hardware, not OS dependent.
: --
:
: Donovan Ready,
: Lindsay Computer Systems
: http://www.jumpnet.com/~lcs

 
 
 

ECC vs parity: Gigabyte 686DX

Post by Michael B. Mart » Tue, 22 Jul 1997 04:00:00


: As far as I know, the ECC is done well before the OS knows anything has
: happened. This is hardware, not OS dependent.

Right, but if it's an error that can't be corrected (but is caught),
the following system behavior depends on the OS (someone please
correct me if I'm wrong).  I did some checking before I got my current
system (Pentium with Triton FX chipset) to find out just what the deal
with parity SIMM usage was.  The answer was that when an error occurs
(in parity RAM with checking enabled), a NMI is generated.  The OS can
trap this and act accordingly, but it doesn't have to (in which case
the BIOS handler is invoked).  For example, MS-DOS ignores it and so
the BIOS routine runs, which prints a little message and halts the
system (not too good for a Linux box).  I am told that Linux catches
parity errors and uses the address indicated to kill the application
(or kernel, depending on where in memory the error occurred).

Anyone know how other OS's (like NT) handle parity/ECC error NMIs?

To address the question of the original poster, I'd say that if you
get parity error messages chances are probably very good that it's
a bad SIMM.  Is the address of the error always the same (or close)?
Try removing SIMMs and switching them around and see if it changes.
You can isolate a bad SIMM that way.

Michael

 
 
 

ECC vs parity: Gigabyte 686DX

Post by Dan Ts' » Wed, 23 Jul 1997 04:00:00



: I think you are correct. A two bit error should generate an NMI.

: We are an Asus dealer, and there was a BIOS beta (since corrected) that
: acted similarly to the Gigabyte problem reported above.

: As far as I know, the ECC is done well before the OS knows anything has
: happened. This is hardware, not OS dependent.

        I think ideally you would want a warning condition, not a fatal error,
to be raised so the OS can log it and bring it to the sysadmin's attention.
        This is how ECC was handled on older PDP-11's and VAXes.
--
                        Cheers,
                        Dan Ts'o                        212-327-7671
                        Dept. of Neurobiology           FAX: 212-327-7671
                        The Rockefeller University


 
 
 

1. ECC vs. parity memory

If I buy a SPARCstation 5 as a departmental server (no money to buy
anything better), does anyone have any feeling (educated guesses are
ok) as to how many additional crashes per year I can expect with its
parity memory?  Is it a mistake to buy this instead of a remanufactured
SPARCstation 10/
model 41 with ECC memory?  Our current 470 sparcserver has ECC memory
and has been pretty reliable.  Thanks for any wisdom on this.

If possible, please reply via email & posting.  Thanks.

Dave Robbins

2. Linux/Perl -> Oracle 7.3 on SCO Unix (or other)

3. Experiences with new Gigabyte 686DX MB?

4. Correlating scsi-id's of drives to controllers and file-systems

5. SMP Problems with Gigabyte DA-686DX & PPro

6. Eicon ISDN card and the 2.2 kernel

7. parity vs. non-parity memory ???

8. Metrics Software

9. Parity vs Non-Parity RAM

10. RAM-parity vs non-parity

11. Parity vs. non-parity for linux ??

12. Question: how does Linux handle Parity/ECC errors?

13. ECC/Parity interrupt handled?