Sun Fire 280R - keeps crashing with "Error ECC corrected"

Sun Fire 280R - keeps crashing with "Error ECC corrected"

Post by Rob » Sat, 23 Jun 2001 05:42:56



I have 2x Sun Fire 280R's.  They both arrived at the same time.  One has 1
CPU and 1Gb RAM, the other has 2x CPU and 2GB RAM.  Both have the same PROM
level (4.00 if I remember).  Both have been running for approx. 3 weeks.
The smaller one keeps crashing with "Error ECC Corrected" displayed at the
PROM level and kernel panic's leading to crash files (vmcore, etc..) being
created.  When I can get it up to init 3 it lasts for about 2 hours max
before crashing.

I've changed the memory from the larger server to the smaller.  I've done
the same with the GFx graphics board (identical in both anyway).  I've
flashed the PROM with latest OPP and POST.

The smaller one runs BIND 9.1 server the other one is just Solaris 8.  Both
have the same versions of Solaris (01/01 with 18 June Recommended patch
cluster).

The larger server is fine.

Has anyone had a similar problem (error ECC corrected)?

 
 
 

Sun Fire 280R - keeps crashing with "Error ECC corrected"

Post by A4 » Sat, 23 Jun 2001 11:00:46


sounds like memory.


Quote:> I have 2x Sun Fire 280R's.  They both arrived at the same time.  One has 1
> CPU and 1Gb RAM, the other has 2x CPU and 2GB RAM.  Both have the same
PROM
> level (4.00 if I remember).  Both have been running for approx. 3 weeks.
> The smaller one keeps crashing with "Error ECC Corrected" displayed at the
> PROM level and kernel panic's leading to crash files (vmcore, etc..) being
> created.  When I can get it up to init 3 it lasts for about 2 hours max
> before crashing.

> I've changed the memory from the larger server to the smaller.  I've done
> the same with the GFx graphics board (identical in both anyway).  I've
> flashed the PROM with latest OPP and POST.

> The smaller one runs BIND 9.1 server the other one is just Solaris 8.
Both
> have the same versions of Solaris (01/01 with 18 June Recommended patch
> cluster).

> The larger server is fine.

> Has anyone had a similar problem (error ECC corrected)?


 
 
 

Sun Fire 280R - keeps crashing with "Error ECC corrected"

Post by chad schroc » Sat, 23 Jun 2001 11:35:15



> I have 2x Sun Fire 280R's.  They both arrived at the same
> time.  One has 1 CPU and 1Gb RAM, the other has 2x CPU and
> 2GB RAM.  Both have the same PROM level (4.00 if I remember).
> Both have been running for approx. 3 weeks. The smaller one
> keeps crashing with "Error ECC Corrected" displayed at the
> PROM level and kernel panic's leading to crash files (vmcore,
> etc..) being created.  When I can get it up to init 3 it lasts
> for about 2 hours max before crashing.

call Sun and get the CPU replaced.

--
chad at radix dot net

 
 
 

Sun Fire 280R - keeps crashing with "Error ECC corrected"

Post by Mathew Kirsc » Sun, 24 Jun 2001 04:24:44



> I have 2x Sun Fire 280R's.  They both arrived at the same time.  One has 1
> CPU and 1Gb RAM, the other has 2x CPU and 2GB RAM.  Both have the same PROM
> level (4.00 if I remember).  Both have been running for approx. 3 weeks.
> The smaller one keeps crashing with "Error ECC Corrected" displayed at the
> PROM level and kernel panic's leading to crash files (vmcore, etc..) being
> created.  When I can get it up to init 3 it lasts for about 2 hours max
> before crashing.

That means the system is having serious memory problems. If changing the
memory doesn't help, then it's something else about the system that's broken.
If they're still under warranty, call Sun. If you have a service contract,
call Sun. If you don't have a service contract, get a service contract and
call Sun.
 
 
 

Sun Fire 280R - keeps crashing with "Error ECC corrected"

Post by Rob » Sun, 24 Jun 2001 05:07:12


Sun came out (under warranty, not contract) and replaced the CPU (SPARCIII
750MHz).  Everything seems stable now.

A word of caution to Sun Fire 280R owners: all three CPUs (in two 280R's)
had very low torque.  Make sure it's not too low or they could work lose.



> > I have 2x Sun Fire 280R's.  They both arrived at the same time.  One has
1
> > CPU and 1Gb RAM, the other has 2x CPU and 2GB RAM.  Both have the same
PROM
> > level (4.00 if I remember).  Both have been running for approx. 3 weeks.
> > The smaller one keeps crashing with "Error ECC Corrected" displayed at
the
> > PROM level and kernel panic's leading to crash files (vmcore, etc..)
being
> > created.  When I can get it up to init 3 it lasts for about 2 hours max
> > before crashing.

> That means the system is having serious memory problems. If changing the
> memory doesn't help, then it's something else about the system that's
broken.
> If they're still under warranty, call Sun. If you have a service contract,
> call Sun. If you don't have a service contract, get a service contract and
> call Sun.

 
 
 

Sun Fire 280R - keeps crashing with "Error ECC corrected"

Post by Mathew Kirsc » Tue, 26 Jun 2001 23:29:07



> Sun came out (under warranty, not contract) and replaced the CPU (SPARCIII
> 750MHz).  Everything seems stable now.

Sweet. There are some problems where it pays to admit you don't know it all,
and call in the Big Guns(tm). :)

Quote:> A word of caution to Sun Fire 280R owners: all three CPUs (in two 280R's)
> had very low torque.  Make sure it's not too low or they could work lose.

Yep, them little *s are always trying to escape :)

"Hello, is your CPU running?"
"Holy crap! Ed! Grab the net! The 280R got loose again!"