Crashing Ultra10 Boxes

Crashing Ultra10 Boxes

Post by Phillip M » Sat, 26 May 2001 01:58:10



I would take the time needed to find a console cable, so at least you
can see what message might show up on the screen.
Also you should consider enabling the core file feature through the
script /etc/rc2.d/S20sysetup., just be sure to specify a filesystem
with enough space, since the core files can be large.
One last thing, if you continue to just "reboot" the system, there is
a chance you can corrupt filesystems, so be sure to umount and run
fsck if time permits.

good luck


>Hi all,
>Here is what's happening lately in our server room:
>I come in and find out that I cannot access a box (or someone tells me) and
>so I run down to the server room and reboot it and it comes up. Rebooting is
>easier than trying to find a cable to console it in. The box comes up fine
>but I want to know what is the best place to look for signs of its crash.
>What log files will give me an idea as to what went wrong? This is happening
>to ultra10's running Solaris 7.

>Thanks

Phillip Mau
Unix System Admin
MauComm Communications
 
 
 

Crashing Ultra10 Boxes

Post by Rich Andrew » Sat, 26 May 2001 20:08:33


I hate to sound like a smart ass, but if you are unwilling to find a
console cable, then how can you ever hope to find the problem?

rich


> Hi all,
> Here is what's happening lately in our server room:
> I come in and find out that I cannot access a box (or someone tells me) and
> so I run down to the server room and reboot it and it comes up. Rebooting is
> easier than trying to find a cable to console it in. The box comes up fine
> but I want to know what is the best place to look for signs of its crash.
> What log files will give me an idea as to what went wrong? This is happening
> to ultra10's running Solaris 7.

> Thanks


 
 
 

Crashing Ultra10 Boxes

Post by Brad Veneracio » Sun, 27 May 2001 00:49:57



> >Hi all,
> >Here is what's happening lately in our server room:
> >I come in and find out that I cannot access a box (or someone tells me) and
> >so I run down to the server room and reboot it and it comes up. Rebooting is
> >easier than trying to find a cable to console it in. The box comes up fine
> >but I want to know what is the best place to look for signs of its crash.
> >What log files will give me an idea as to what went wrong? This is happening
> >to ultra10's running Solaris 7.

We just had a CPU replaced on an Ultra 10, and it solved everything.  The box
would crash, leaving a few messaged about fast address MMU misses and stuff like
that. We even got the message while booting off the CD to re-install Solaris 8.
Sometimes the system would crash soon after a reboot.  Sometimes the system would
go for a day or two before crashing.   The local Sun rep mentioned that there
have been a lot of problems with the 440 MHz processors.
 
 
 

Crashing Ultra10 Boxes

Post by Ray Hal » Sun, 27 May 2001 01:12:27


You should get a console cable and poke around in /var/adm/messages.  See if
there's anything in there that could lead you to an answer.  I just had an
Ultra10 with a cpu problem and it was dumping errors to that file.  You
might also think about setting up logging in your vfstab.  If you're going
to keep rebooting like you are, this will help with the reboot time.

Ray

> I hate to sound like a smart ass, but if you are unwilling to find a
> console cable, then how can you ever hope to find the problem?

> rich


> > Hi all,
> > Here is what's happening lately in our server room:
> > I come in and find out that I cannot access a box (or someone tells me)
and
> > so I run down to the server room and reboot it and it comes up.
Rebooting is
> > easier than trying to find a cable to console it in. The box comes up
fine
> > but I want to know what is the best place to look for signs of its
crash.
> > What log files will give me an idea as to what went wrong? This is
happening
> > to ultra10's running Solaris 7.

> > Thanks

 
 
 

Crashing Ultra10 Boxes

Post by circus jun » Sun, 27 May 2001 11:01:48


I have had 7 CPU failures in the last Mounth , 3 in one week
SUN admits they got a Bad batch of the 440 Mhz 2 MB cache CPU's
Most of these were delvered between October of last year and February
of this year.
 If it is Less than a Year old , Just call SUN ( 1-800-USA-4SUN ) and
they will replace it

>I would take the time needed to find a console cable, so at least you
>can see what message might show up on the screen.
>Also you should consider enabling the core file feature through the
>script /etc/rc2.d/S20sysetup., just be sure to specify a filesystem
>with enough space, since the core files can be large.
>One last thing, if you continue to just "reboot" the system, there is
>a chance you can corrupt filesystems, so be sure to umount and run
>fsck if time permits.

>good luck


>>Hi all,
>>Here is what's happening lately in our server room:
>>I come in and find out that I cannot access a box (or someone tells me) and
>>so I run down to the server room and reboot it and it comes up. Rebooting is
>>easier than trying to find a cable to console it in. The box comes up fine
>>but I want to know what is the best place to look for signs of its crash.
>>What log files will give me an idea as to what went wrong? This is happening
>>to ultra10's running Solaris 7.

>>Thanks

>Phillip Mau
>Unix System Admin
>MauComm Communications