Question about Sun Memory Problem Article in InfoWorld...

Question about Sun Memory Problem Article in InfoWorld...

Post by NYCeye » Wed, 30 Aug 2000 11:33:42



Hi:

I have two questions...

(1) Does anyone know the error messages related to the
    alleged Sun memory problem discussed in the InfoWorld
    article at the following link:

http://www.infoworld.com/articles/hn/xml/00/08/25/000825hnsunmemory.xml

    I have several UExx00 machines that panic on SRAM ECACHE DATA PARITY
    ERRORS. Replacing, or at least Off-lining the offending CPU (via
    psradm), eliminates the problem. But it happens too frequently.

(2) On a different note (actually, who knows, it might be related),
    does anyone know what this error might be caused by...

    -----------------------------------------------------------
    panic[cpu26]/thread=0x6768cb40: Async data error at tl1:
    AFAR 0x000001bf.ffffffb0 AFSR 0x00000000.80401000
    -----------------------------------------------------------

    A related Sun bug ID (Sun BugID: 4167924), is either unresolved
    or is resolved but unpublished on SunSolve.

Any info on either would be appreciated.

TIA,
NYCeyes

 
 
 

Question about Sun Memory Problem Article in InfoWorld...

Post by Brian Scanl » Wed, 30 Aug 2000 04:00:00



>(1) Does anyone know the error messages related to the
>    alleged Sun memory problem discussed in the InfoWorld
>    article at the following link:

Yes.

Quote:>http://www.infoworld.com/articles/hn/xml/00/08/25/000825hnsunmemory.xml

Dodgy article.

Quote:>    I have several UExx00 machines that panic on SRAM ECACHE DATA PARITY
>    ERRORS. Replacing, or at least Off-lining the offending CPU (via
>    psradm), eliminates the problem. But it happens too frequently.

So, why haven't you rang Sun support?

Quote:>(2) On a different note (actually, who knows, it might be related),
>    does anyone know what this error might be caused by...

>    -----------------------------------------------------------
>    panic[cpu26]/thread=0x6768cb40: Async data error at tl1:
>    AFAR 0x000001bf.ffffffb0 AFSR 0x00000000.80401000
>    -----------------------------------------------------------

>    A related Sun bug ID (Sun BugID: 4167924), is either unresolved
>    or is resolved but unpublished on SunSolve.

Looks like it may be Ecache related. Ring Sun. Have iscda output of all your
crashes. You won't get far without the actual panic strings.

--
Columnated ruins domino,
Canvas the town and brush the backdrop.
Are you sleeping?

 
 
 

Question about Sun Memory Problem Article in InfoWorld...

Post by James A. William » Sat, 02 Sep 2000 00:24:03


Here is the panic string, I've seen enough of them to have them memorized
(some exaggeration):

Aug 12 01:53:44 xxxxxx savecore: reboot after panic: CPU13 Ecache SRAM Data
Parity Error: AFSR 0x00000000.00400004 AFAR 0x00000000.7c03aa30

 
 
 

Question about Sun Memory Problem Article in InfoWorld...

Post by Jim Dav » Sat, 02 Sep 2000 00:57:28




:Here is the panic string, I've seen enough of them to have them memorized
:(some exaggeration):
:
:Aug 12 01:53:44 xxxxxx savecore: reboot after panic: CPU13 Ecache SRAM Data
:Parity Error: AFSR 0x00000000.00400004 AFAR 0x00000000.7c03aa30

A related panic string we've seen on a two-processor E450 is

CPU UE Error: Ecache Copyout on CPU3: AFSR...

Someone who's worked with Sun on this issue (and apparently is still
under NDA) said that both the 'data parity' or 'copyout' ecache
messages were signs of that particular problem.
--

 
 
 

Question about Sun Memory Problem Article in InfoWorld...

Post by ma11a.. » Sat, 02 Sep 2000 01:20:47





> >(1) Does anyone know the error messages related to the
> >    alleged Sun memory problem discussed in the InfoWorld
> >    article at the following link:

> Yes.

>http://www.infoworld.com/articles/hn/xml/00/08/25/000825hnsunmemory.xml

> Dodgy article.

> >    I have several UExx00 machines that panic on SRAM ECACHE DATA
PARITY
> >    ERRORS. Replacing, or at least Off-lining the offending CPU (via
> >    psradm), eliminates the problem. But it happens too frequently.

> So, why haven't you rang Sun support?

> >(2) On a different note (actually, who knows, it might be related),
> >    does anyone know what this error might be caused by...

> >    -----------------------------------------------------------
> >    panic[cpu26]/thread=0x6768cb40: Async data error at tl1:
> >    AFAR 0x000001bf.ffffffb0 AFSR 0x00000000.80401000
> >    -----------------------------------------------------------

> >    A related Sun bug ID (Sun BugID: 4167924), is either unresolved
> >    or is resolved but unpublished on SunSolve.

> Looks like it may be Ecache related. Ring Sun. Have iscda output of
all your
> crashes. You won't get far without the actual panic strings.

> --
> Columnated ruins domino,
> Canvas the town and brush the backdrop.
> Are you sleeping?

Is it me, or is this starting to get a bit out of control?? Mebbe I'm
just paranoid, but I've seen this problem cropping up on newsgroups all
over the place. Is this just going to go on with a few reports here and
there of strange cache errors on 400Mhz CPU's, or is there going to be
an official announcement either denouncing the problem or acknowledging
it.

It's strange for a corporation the size of Sun to just leave this as a
grey area - I've talked with some engineers dealing with Sun support and
there has been no knowledge of this at all.

Anyone like to comment???

Sent via Deja.com http://www.deja.com/
Before you buy.

 
 
 

Question about Sun Memory Problem Article in InfoWorld...

Post by Brian Scanl » Sat, 02 Sep 2000 22:20:01



>Is it me, or is this starting to get a bit out of control??

Not really.

Quote:>Is this just going to go on with a few reports here and
>there of strange cache errors on 400Mhz CPU's, or is there going to be
>an official announcement either denouncing the problem or acknowledging
>it.

Sun have acknowledged it.

Quote:>It's strange for a corporation the size of Sun to just leave this as a
>grey area - I've talked with some engineers dealing with Sun support and
>there has been no knowledge of this at all.

Then they don't pay much attention to their surroundings. :)

Incidentally, check out 105181-23 (the 2.6 kernel patch.)

  1256102 improve survivability when encountering UE
  4269582 Kernel (and OBP) handling of Ecache data and Tag parity errors
        should be enhanced
  4269845 OS needs to recover from a processor failure
  4320394 scrubbing the cache may improve the reliability of some systems

The patch includes "Ecache sc*s" which will limit the Ecache problems.

--
Columnated ruins domino,
Canvas the town and brush the backdrop.
Are you sleeping?

 
 
 

Question about Sun Memory Problem Article in InfoWorld...

Post by hwk.. » Sun, 03 Sep 2000 02:14:55


It's my understanding that a kernel patch which includes an ecache
'sc*' is in beta.  This according to a Sun technical management
type as of about a month ago.  Indeed this has been a real technical
AND PR problem for Sun.






> > >(1) Does anyone know the error messages related to the
> > >    alleged Sun memory problem discussed in the InfoWorld
> > >    article at the following link:

> > Yes.

>http://www.veryComputer.com/

> > Dodgy article.

> > >    I have several UExx00 machines that panic on SRAM ECACHE DATA
> PARITY
> > >    ERRORS. Replacing, or at least Off-lining the offending CPU
(via
> > >    psradm), eliminates the problem. But it happens too frequently.

> > So, why haven't you rang Sun support?

> > >(2) On a different note (actually, who knows, it might be related),
> > >    does anyone know what this error might be caused by...

> > >    -----------------------------------------------------------
> > >    panic[cpu26]/thread=0x6768cb40: Async data error at tl1:
> > >    AFAR 0x000001bf.ffffffb0 AFSR 0x00000000.80401000
> > >    -----------------------------------------------------------

> > >    A related Sun bug ID (Sun BugID: 4167924), is either unresolved
> > >    or is resolved but unpublished on SunSolve.

> > Looks like it may be Ecache related. Ring Sun. Have iscda output of
> all your
> > crashes. You won't get far without the actual panic strings.

> > --
> > Columnated ruins domino,
> > Canvas the town and brush the backdrop.
> > Are you sleeping?

> Is it me, or is this starting to get a bit out of control?? Mebbe I'm
> just paranoid, but I've seen this problem cropping up on newsgroups
all
> over the place. Is this just going to go on with a few reports here
and
> there of strange cache errors on 400Mhz CPU's, or is there going to be
> an official announcement either denouncing the problem or
acknowledging
> it.

> It's strange for a corporation the size of Sun to just leave this as a
> grey area - I've talked with some engineers dealing with Sun support
and
> there has been no knowledge of this at all.

> Anyone like to comment???

> Sent via Deja.com http://www.veryComputer.com/
> Before you buy.

Sent via Deja.com http://www.veryComputer.com/
Before you buy.
 
 
 

1. INFOWORLD article

In this week's Infoworld magazine, they rated BSDI, Linux, and SCO ODT for
business internet access. BSDI came first, costing about $1000 for a 16 user
license. Linux followed close behind, with a $65 CD and book for unlimited
users. SCO ODT came dead last, WAY below Linux, and almost as expensive as
BSDI. It seems that Linux truly is ready for prime time.

                        Simon

--
*******************************************************************************


*  flames to /dev/null                    Linux: choice of the GNU generation *
*  #include <disclaimer.h>                            I don't speak for NCSSM *
*******************************************************************************

2. Netscape convert DBM format to NCSA database possible???

3. Infoworld Article (funny)

4. pids + start times kept in exec calls?

5. (another) entertaining InfoWorld article about Windows 95 crashes

6. no signal from sparc2 on 3com fast ethernet hub

7. Shared memory problem/question (SUN).

8. HOW do I get file name completion to work in SCO's ksh?

9. Usenet problems: getting article headers but not the articles themselves

10. FYI: Forbes article on bizarre Sun Enterprise problems

11. How to mount USB flash memory to Sun Solaris 9 on Sun Blade 150?

12. QUESTION: behavior of free() wrt returning memory to Sun OS

13. Sun workshop memory monitor does not detect all memory leak