Kernel profiling with kgmon on 2.6

Kernel profiling with kgmon on 2.6

Post by Clayton O'Nei » Sat, 03 Apr 1999 04:00:00



I'm running a news server that usually has between 1300-2500 processes
running on it at any given point.  The user cpu time is usually around 5-6%
and the system cpu time is usually about 10-20%.  Obviously to reduce cpu
load, I'll have to figure out what I'm asking the kernel to do that's
causing so much load and try to stop doing that so much :)

Anyway, I've run kgmon on the box and this is a fairly representative sample
of what I'm getting from it.  

   %  cumulative    self              self    total          
 time   seconds   seconds    calls  ms/call  ms/call name    
 44.6       8.35     8.35                            disp_getwork [1]
 23.1      12.68     4.33                            splx [2]
 10.2      14.59     1.91                            idle [3]
  8.9      16.26     1.67                            splhi [4]
  1.5      16.55     0.29                            hat_unload [5]
  0.7      16.68     0.13                            sfmmu_tteload_find_hmeblk [6]
  0.6      16.80     0.12                            as_findseg [7]
  0.6      16.92     0.12                            ip_ocsum_copyin [8]
  0.5      17.01     0.09                            page_lookup [9]
  0.4      17.09     0.08                            _sys_trap [12]
[list truncated for brevity sake]

Now I'm trying to identify what the functions are for and what that means
for my code.  I've been going through "Magic Garden Explained" trying figure
out what they do.  Here are my guesses, and I was hoping someone could
comment on whether or not I'm close.

disp_getwork: It appears that this is part of the scheduler, and I'm
assuming that most of this is because I have so many processes running.
Would going to a threaded model help this?  maybe?

slpx: The book says that this is called when an interrupt routine is exited,
so it seems like this is caused by the scsi and network cards.  Assuming
that I could find better ones that generate less interrupts (the Alteon
ACEnics?), I'm assuming that my cpu utilization would go down?

idle: This thread represents idle time?  Something else?

splhi: I'm assuming this is also related to interrupt load?

Thanks for any help you can provide.

 
 
 

Kernel profiling with kgmon on 2.6

Post by James Maur » Sat, 03 Apr 1999 04:00:00


Yes,  disp_getwork is part of the scheduler (dispatcher!). If memory serves,
it's called to search the dispatch  queues for a runnable kernel thread.
First the RT priority (kernel preempt) queue, than the per-processor queues.

What size system is this? Number and speed of processors? ~2,000 processes
resulting in < 30% CPU  may not be bad at all, depending on what you're
running
on.

/jim

Quote:>   %  cumulative    self              self    total
> time   seconds   seconds    calls  ms/call  ms/call name
> 44.6       8.35     8.35                            disp_getwork [1]
> 23.1      12.68     4.33                            splx [2]
> 10.2      14.59     1.91                            idle [3]
>  8.9      16.26     1.67                            splhi [4]
>  1.5      16.55     0.29                            hat_unload [5]
>  0.7      16.68     0.13      sfmmu_tteload_find_hmeblk [6]
>  0.6      16.80     0.12                            as_findseg [7]
>  0.6      16.92     0.12                            ip_ocsum_copyin [8]
>  0.5      17.01     0.09                            page_lookup [9]
>  0.4      17.09     0.08                            _sys_trap [12]
>[list truncated for brevity sake]


 
 
 

Kernel profiling with kgmon on 2.6

Post by Casper H.S. Dik - Network Security Engine » Sun, 04 Apr 1999 04:00:00


[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]


>I'm running a news server that usually has between 1300-2500 processes
>running on it at any given point.  The user cpu time is usually around 5-6%
>and the system cpu time is usually about 10-20%.  Obviously to reduce cpu
>load, I'll have to figure out what I'm asking the kernel to do that's
>causing so much load and try to stop doing that so much :)

Your CPUs are basically idle?  (6+20 = 26 << 100%)

Quote:>Anyway, I've run kgmon on the box and this is a fairly representative sample
>of what I'm getting from it.  
>disp_getwork: It appears that this is part of the scheduler, and I'm
>assuming that most of this is because I have so many processes running.
>Would going to a threaded model help this?  maybe?
>slpx: The book says that this is called when an interrupt routine is exited,
>so it seems like this is caused by the scsi and network cards.  Assuming
>that I could find better ones that generate less interrupts (the Alteon
>ACEnics?), I'm assuming that my cpu utilization would go down?
>idle: This thread represents idle time?  Something else?
>splhi: I'm assuming this is also related to interrupt load?

I fmy understanding of things and how to work is correct, I think that
what you're seeign (disp_getwork, splx/splhi/idle) are together the main
components of teh idle loop.  Together they account for 75% or so
of CPU time, which is in keeping with the 26% of CPU used you see.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

Kernel profiling with kgmon on 2.6

Post by Clayton O'Nei » Sun, 04 Apr 1999 04:00:00


|
|Yes,  disp_getwork is part of the scheduler (dispatcher!). If memory serves,
|it's called to search the dispatch  queues for a runnable kernel thread.
|First the RT priority (kernel preempt) queue, than the per-processor queues.
|
|What size system is this? Number and speed of processors? ~2,000 processes
|resulting in < 30% CPU  may not be bad at all, depending on what you're
|running
|on.

It's an E4000 w/8 336 cpus.  I'm trying to determine how much of the time is
being spent just traversing the process table so that I'll know if it'll
make sense for me to rewrite the code to use less processes.  

 
 
 

Kernel profiling with kgmon on 2.6

Post by Clayton O'Nei » Sun, 04 Apr 1999 04:00:00



|Your CPUs are basically idle?  (6+20 = 26 << 100%)

Basically, however, based on the about of time I'm currently spending in the
kernel, it looks as if I'm going to run out of cpu before I run out of
anything else.  Because of this I'm trying to decide how much cpu I'm
wasting because of the one process per connection model and if it's
worthwhile to rewrite the code to use multiple connections per process.  I'm
thinking of either doing an event loop based on select/poll or else
threading the code and using a thread per connection instead of a process.

FWIW, this is a USENET news server running a very modified copy of INN.

 
 
 

Kernel profiling with kgmon on 2.6

Post by Casper H.S. Dik - Network Security Engine » Sun, 04 Apr 1999 04:00:00


[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]



>|Your CPUs are basically idle?  (6+20 = 26 << 100%)
>Basically, however, based on the about of time I'm currently spending in the
>kernel, it looks as if I'm going to run out of cpu before I run out of
>anything else.  Because of this I'm trying to decide how much cpu I'm
>wasting because of the one process per connection model and if it's
>worthwhile to rewrite the code to use multiple connections per process.  I'm
>thinking of either doing an event loop based on select/poll or else
>threading the code and using a thread per connection instead of a process.

No, if your %sys CPU != 75% it';s all idle CPU.

The CPU will always be fdoing things, but it's doing
idle things (like stealing jobs from otehr CPUs or see if there's
anything to steal)

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

1. Solaris kernel profiling (kgmon)

hi,

i wonder if anyone can help with kernel profiling in Solaris 2.4
/usr/bin/kgmon exists but there is no man page nor anything in the
answerbooks. "kgmon -i" crashes the machine.

do we need to build a profiling version of the kernel? is any special
configuration required?

thanks in advance,

-stef

2. F77 Optimizer problem

3. 2.6 Jumpstart Profile - to mount /usr/dt

4. LinuxPPC UltraSCSI card support?

5. su and .profile under SunOS 4.1.4 and Solaris 2.6

6. XFree86 3.1: cursors _far_ from hotspots

7. [Fwd: LOCAL: Central Florida ELUG February Installfest!]

8. IPv6 module of Kernel 2.4.x & Kernel 2.6.x

9. 2.6 FCS -> 2.6 5/98 upgrade fails because /usr moved to /usr:2.6

10. kgmon

11. Monitoring TCP kernel control block entries for Solaris 2.6

12. iptables masquerading/snat stop working upon moving to kernel 2.6