Procfs tools and Truss utliity are broken, after Kernel patch 106541-18

Procfs tools and Truss utliity are broken, after Kernel patch 106541-18

Post by Gopal » Tue, 08 Jan 2002 04:22:20



We have an E10k domain ( mission critical ) that had lots of stability and
performance problems. Sun has suggested to apply 106541-18 kernel patch (
stating that they have observed too many XCALs happening on our 10 cpu
domain ). But after applying this patch, the opensource LSOF got
significantly slowed down. Prior to applying this patch lsof used to run for
5-10secs and give 38000+ open file report ( when run simply as,  lsof -n  ).
Now it is taking more than 8minutes even when domain has no load. Under load
conditions, I could not  risk running lsof.
Procfs utilities:
pfiles: when run against an oracle process, it randomly hangs. The only way
to get out of this situation is kill the pfiles command. This leaves the
target Process in a "STOP" state, and there is no means to put the stopped
process back to run state.
truss: truss frequently is* the process against which it is run ( the
process is originally run by oracle ) dumping a "Psetrun: Device busy" lines
at a rapid rate.

Overall we have observed that after this patch, system tools "find, ps etc.,
" are running longer and are taking more CPU cycles.

Did anyone encounter similar experience.

tia
Gopi.

 
 
 

Procfs tools and Truss utliity are broken, after Kernel patch 106541-18

Post by Vic Abe » Tue, 08 Jan 2002 20:59:01



>We have an E10k domain ( mission critical ) that had lots of stability and
>performance problems. Sun has suggested to apply 106541-18 kernel patch (
>stating that they have observed too many XCALs happening on our 10 cpu
>domain ). But after applying this patch, the opensource LSOF got
>significantly slowed down. Prior to applying this patch lsof used to run for
>5-10secs and give 38000+ open file report ( when run simply as,  lsof -n  ).
>Now it is taking more than 8minutes even when domain has no load. Under load
>conditions, I could not  risk running lsof.
>Procfs utilities:
>pfiles: when run against an oracle process, it randomly hangs. The only way
>to get out of this situation is kill the pfiles command. This leaves the
>target Process in a "STOP" state, and there is no means to put the stopped
>process back to run state.
>truss: truss frequently is* the process against which it is run ( the
>process is originally run by oracle ) dumping a "Psetrun: Device busy" lines
>at a rapid rate.
>Overall we have observed that after this patch, system tools "find, ps etc.,
>" are running longer and are taking more CPU cycles.
>Did anyone encounter similar experience.

While I can't offer any insight into why the patch might be slowing
your system, I can suggest that you read section 3.2 of the lsof
FAQ.  It suggests some options you can use to speed lsof response.
Using one of them might give you a clue to what is slowing your
system.

That lsof FAQ section lists the option of disabling lsof's reading
of the kernel name cache with -C as one possible way to make lsof
run faster.  That may have particular relevance to your situation.

Changes in the kernel name cache made available after Solaris 8
FCS slowed lsof's processing of the kernel name cache until an lsof
modification was made available at lsof revision 4.50.  It could
be that the patch you applied contained in its bundle the kernel
name cache change that slowed lsof.

I'd also suggest if you have an lsof revision earlier than 4.50,
you update to the latest lsof revision, 4.60.

Vic Abell, lsof author

 
 
 

1. DNLC kernel stats broken in Solaris 2.7/106541-18 ?

Hi !

----

Can anyone explain the output of the "total name lookups" line ?

% vmstat -s
        0 swap ins
        0 swap outs
        0 pages swapped in
        0 pages swapped out
363779541 total address trans. faults taken
 10046182 page ins
   979128 page outs
 11549119 pages paged in
  2432391 pages paged out
   901390 total reclaims
   892739 reclaims from free list
        0 micro (hat) faults
363779541 minor (as) faults
  9550880 major faults
 37696063 copy-on-write faults
153804394 zero fill page faults
 71482047 pages examined by the clock daemon
     1527 revolutions of the clock hand
 14493895 pages freed by the clock daemon
  1171437 forks
   730442 vforks
  1860960 execs
727077106 cpu context switches
1432582984 device interrupts
465991560 traps
2671370082 system calls
18446744072071900789 total name lookups (cache hits 0%)
 63609364 user   cpu
 16502397 system cpu
218096162 idle   cpu
  6366341 wait   cpu
% uname -a
SunOS test001 5.7 Generic_106541-18 sun4u sparc SUNW,Ultra-5_10

Is this a known bug ? This usually happens after 30-40days uptime...

----

Bye,
Roland

--
  __ .  . __


  /O /==\ O\  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
 (;O/ \/ \O;) TEL +49 641 99-41370 FAX +49 641 99-41359

2. support with energy saving monitor??

3. Problem with Solaris 7 Kernel patch 106541-15 and catopen/catgets

4. Boot from hdisk0

5. 5.7 kernel patch (106541-04) problem

6. HELP: still not printing from linux.

7. How to apply the patch #106541-07?

8. XView libraries for LinuxPPC R4

9. Patch 106541-14 and a DVD drive (Solaris 7)

10. solaris 7 patch 106541-15

11. Patch 103600-18: nfs, tlimod and rpcmod may be broken

12. Which Patch breaks truss?

13. newnat13-and-helpers-2.4.18.gz Failed to patch with linux 2.4.18 kernel?