sol2.6 E3000 randomly panic

sol2.6 E3000 randomly panic

Post by SeokChan LE » Sun, 19 Jul 1998 04:00:00



I have a E3000 with 4 168MHz CPUs. All latest 2.6 patches are installed.

        SUNW,UltraSPARC (upaid 6 impl 0x10 ver 0x40 clock 168 MHz)
        SunOS sol 5.6 Generic_105181-06 sun4u sparc SUNW,Ultra-Enterprise

It crashes randomly with the following messages:

BAD TRAP: cpu=6 type=0x9 rp=0x3023f9a0 addr=0x0 mmu_fsr=0x0
sched: trap type = 0x9
pid=0, pc=0x0, sp=0x3023fa30, tstate=0x77001e01, context=0x0
g1-g7: 10433400, 102, 625ee818, ffffffffffffffff, 800015fbe0487a
61, 0, 3023fe80
Begin traceback... sp = 3023fa30
Called from 6084743c, fp=3023fa90, args=0 ee910b9 628b4b18 0 218
 62718588
Called from 60846a40, fp=3023faf0, args=61878520 0 60062330 61b7
81e4 627fc690 0
Called from 100b7770, fp=3023fb50, args=60062330 61878520 61b781
e4 0 218 62718588
Called from 100b6a88, fp=3023fbb0, args=61b78240 1 0 0 0 608469a
8
Called from 100b42e4, fp=3023fc10, args=61b78240 2200 f000 1 11
2000
Called from 100b4448, fp=3023fc70, args=61b78218 61b781e4 2200 2
0000 3b 104381a6
Called from 100268dc, fp=3023fce0, args=61b781e4 104381a0 10425d
c8 104173a0 10000 61b78218
Called from 100b4374, fp=0, args=0 0 8d110ea3 413d4385 9e6cae42
9a43ded2
End traceback...
panic[cpu6]/thread=0x3023fe80: trap
syncing file systems... done
 8969 static and sysmap kernel pages
  109 dynamic kernel data pages
  236 kernel-pageable pages
    0 segkmap kernel pages
    0 segvn kernel pages
    0 current user process pages
 9314 total pages (9314 chunks)

dumping to vp 601e65dc, offset

I've search Dejanews and SunSolve online but can't find anything
helpful. What's trap type '0x9' for sched ? How can I fix this
problem ?

SeokChan LEE
Inet, Inc.

 
 
 

sol2.6 E3000 randomly panic

Post by Gavin Maltb » Tue, 21 Jul 1998 04:00:00



> I have a E3000 with 4 168MHz CPUs. All latest 2.6 patches are installed.

>    SUNW,UltraSPARC (upaid 6 impl 0x10 ver 0x40 clock 168 MHz)
>    SunOS sol 5.6 Generic_105181-06 sun4u sparc SUNW,Ultra-Enterprise

> It crashes randomly with the following messages:

> BAD TRAP: cpu=6    type=0x9 rp=0x3023f9a0 addr=0x0 mmu_fsr=0x0
> sched: trap type = 0x9
> pid=0, pc=0x0, sp=0x3023fa30, tstate=0x77001e01, context=0x0
> g1-g7: 10433400, 102, 625ee818,    ffffffffffffffff, 800015fbe0487a
> 61, 0, 3023fe80
> Begin traceback... sp =    3023fa30
> Called from 6084743c, fp=3023fa90, args=0 ee910b9 628b4b18 0 218
>  62718588
> Called from 60846a40, fp=3023faf0, args=61878520 0 60062330 61b7
> 81e4 627fc690 0
> Called from 100b7770, fp=3023fb50, args=60062330 61878520 61b781
> e4 0 218 62718588
> Called from 100b6a88, fp=3023fbb0, args=61b78240 1 0 0 0 608469a
> 8
> Called from 100b42e4, fp=3023fc10, args=61b78240 2200 f000 1 11
> 2000
> Called from 100b4448, fp=3023fc70, args=61b78218 61b781e4 2200 2
> 0000 3b    104381a6
> Called from 100268dc, fp=3023fce0, args=61b781e4 104381a0 10425d
> c8 104173a0 10000 61b78218
> Called from 100b4374, fp=0, args=0 0 8d110ea3 413d4385 9e6cae42
> 9a43ded2
> End traceback...
> panic[cpu6]/thread=0x3023fe80: trap
> syncing    file systems... done
>  8969 static and sysmap    kernel pages
>   109 dynamic kernel data pages
>   236 kernel-pageable pages
>     0 segkmap kernel pages
>     0 segvn kernel pages
>     0 current user process pages
>  9314 total pages (9314    chunks)

> dumping    to vp 601e65dc, offset

> I've search Dejanews and SunSolve online but can't find anything
> helpful. What's trap type '0x9' for sched ? How can I fix this
> problem ?

"sched" just means this was a pure kernel thread that had a bad
trap---it was not a kernel thread that supported a user process.
Other than that, the mention of sched is irrelevant (particularly,
it was not a scheduling problem).

Trap type 9 in SPARC v9 is an instruction_access_MMU_miss.
It means we tried to fetch an instruction to execute but
the address was invalid.  This could come as a result of
a bug,  eg an indirect jump to a function as in

        void (*f)(void);        /* f is pointer to some function */
        f = ... ;               /* some garbage */
        (*f)();

This will give the above fault if executed in the kernel.
Normally the "some garbage" is not an absolute declaration
but retrieved from some data structure.

Quite commonly this error is also associated with a broken
component (eg a bit flip causes as to try to jump to some
daft address).

In your case,  something has tried to jump to address 0x0
(pc=0x0 in bad trap output above).  I'd be more suspicious
that this was a software bug in some kernel module.  Run adb
on the crash and use $c to get the stack (or if that fails
use address/ai on each of the "Called from" addresses above)
to see where we were coming from.

Cheers

Gavin

 
 
 

1. NIS processes just die randomly after Sol2.6->8 upgrade

Having a problem.  We had a happy NIS setup on two Solaris 2.6 boxes,
master and slave.  We're upgrading to Solaris 8.  So we moved the NIS
master to another box that had been upgraded to Sol8, keeping the
previous slave 2.6 box in place.

Now, the yp processes on the master just die, a couple times a day.
It seems to run fine other than that (all the clients get their maps,
etc).  No messages to syslog are generated by the problem itself - the
processes (ypbind, ypserv -d, ypxfrd, rpc.yppasswdd, rpc.ypupdated)
just go poof.  The slave gets all messed up after a couple of minutes
when this happens as well.  One second ypwhich is returning the right
thing, the next it says the domain isn't bound.

The slave is also the primary NFS mount for home directories and
whatnot.

Any ideas as to what might be the matter?  We've applied some yp
patches from SunSolve to no effect.  Is there a specific issue with
migrating NIS from 2.6 to 8 (we're using all the same map files and
whatnot...) or with mixing master and slave Solaris versions?

Thanks,
Ernest

2. get /.hash=

3. Sol2.4 kernel panic... :-?

4. ADU: A Date Utility Source

5. SS20/Sol2.3 kernel panics when ttya set bidirectional

6. Rockwell MDP3858 modem

7. Sol2.3 panics on ppp dialin

8. NFS file creation/modification times

9. telnet no longer works well after sol2.4 to sol2.5.1 upgrade

10. Moving a disk from Sol2.3 to Sol2.4 : will it work ?

11. sol2.5.1 buggier than sol2.4

12. behavior of accept different in sol2.6 upwards from sol2.5

13. PANIC PANIC PANIC