lockstat -kgIW and spl()

lockstat -kgIW and spl()

Post by Ed » Sat, 14 Jul 2001 08:12:37



        I've got a busy, single-application (massively multithreaded)
server that often sees spikes in system cpu util when my application
doesn't seem any busier.  Say my app is running at 35% cpu and system
around 45%, sometimes system will jump to 55%-60% without any
additional users or network traffic.

        So I'm trying to figure out just exactly what the kernel is
doing during these spikes.  This is Solaris 8 Intel edition (x86).  I
found that lockstat -kgIW and lockstat -kIW produce interesting output
about what the kernel is doing, but I see no "smoking gun."

# lockstat -kIW -D 30 iostat -c 30 2
     cpu
 us sy wt id
  3  6  4 88
 33 56  6  5

Profiling interrupt: 22425 events in 30.111 seconds (745 events/sec)

Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
-------------------------------------------------------------------------------
 7982  36%  36% 1.00  4896876 cpu[1]                 (usermode)
 1795   8%  44% 1.00  5019835 cpu[4]                 kcopy
 1483   7%  50% 1.00  5055141 cpu[5]                 spl
 1349   6%  56% 1.00  5092220 cpu[6]+11              disp_getwork
  970   4%  61% 1.00  5191712 cpu[6]                 mutex_enter
  823   4%  64% 1.00  4993304 cpu[6]                 mutex_vector_enter
  641   3%  67% 1.00  5233819 cpu[7]+6               bcopy
  254   1%  68% 1.00  4892571 cpu[0]                 bzero
[...deletia...]

        I've tried to find out what spl() does, and it appears to have
something to do with masking interrupts -- and that it isn't even
supposed to exist/be supported in Solaris 8(!!)  (Hey, this *is*
Solaris 8!)  Do have have some goofy device driver?  Or were the rumors
of the death of spl() just an exaggeration :)

        I assume mutex_enter and mutex_vector_enter have to do
with kernel locks.  So I try this:

# lockstat -D 10 iostat -c 30 2
     cpu
 us sy wt id
  3  6  4 87
 34 54  7  5

Adaptive mutex spin: 674705 events in 30.102 seconds (22414 events/sec)

Count indv cuml rcnt     spin Lock                   Caller                  
-------------------------------------------------------------------------------
438286  65%  65% 1.00       91 0xe54c4cac             lwpchan_lock+0x4c      
52085   8%  73% 1.00       58 freebs_lock            cv_wait+0x4f            
48272   7%  80% 1.00       57 freebs_lock            freebs+0x92            
25208   4%  84% 1.00       50 freebs_lock            freebs_enqueue+0x10    
[...]
-------------------------------------------------------------------------------

Adaptive mutex block: 30614 events in 30.102 seconds (1017 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller                  
-------------------------------------------------------------------------------
23700  77%  77% 1.00   688808 0xe54c4cac             lwpchan_lock+0x4c      
 1619   5%  83% 1.00   122756 freebs_lock            cv_wait+0x4f            
  988   3%  86% 1.00   125525 freebs_lock            freebs_enqueue+0x10    
  463   2%  87% 1.00   124618 freebs_lock            freebs+0x92            
[...]
-------------------------------------------------------------------------------

Spin lock spin: 312855 events in 30.102 seconds (10393 events/sec)

Count indv cuml rcnt     spin Lock                   Caller                  
-------------------------------------------------------------------------------
35552  11%  11% 1.00     1466 xc_mbox_lock+0x11      lock_set_spl+0x3d      
32096  10%  22% 1.00      420 cpu[1]+0x40            disp_lock_enter+0x39    
32073  10%  32% 1.00      438 cpu[0]+0x40            disp_lock_enter+0x39    
30673  10%  42% 1.00      445 cpu[3]+0x40            disp_lock_enter+0x39    
29453   9%  51% 1.00      423 cpu[5]+0x40            disp_lock_enter+0x39    
29410   9%  60% 1.00      405 cpu[6]+0x40            disp_lock_enter+0x39    
28451   9%  70% 1.00      448 cpu[2]+0x40            disp_lock_enter+0x39    
27312   9%  78% 1.00      432 cpu[4]+0x40            disp_lock_enter+0x39    
13595   4%  83% 1.00     2036 turnstile_table+0x68c  disp_lock_enter+0x39    
 7341   2%  85% 1.00      570 cpu[7]+0x40            disp_lock_enter+0x39    
-------------------------------------------------------------------------------

Thread lock spin: 1079 events in 30.102 seconds (36 events/sec)

Count indv cuml rcnt     spin Lock                   Caller                  
-------------------------------------------------------------------------------
  159  15%  15% 1.00     1144 turnstile_table+0x68c  ts_tick+0x15            
  154  14%  29% 1.00      166 lwpsleepq+0x53c        ts_tick+0x15            
   69   6%  35% 1.00      137 transition_lock        ts_tick+0x15            
   35   3%  39% 1.00       62 lwpsleepq+0x53c        turnstile_block+0x1e5  
   35   3%  42% 1.00      151 cpu[4]+0x64            lwp_block+0x21          
   34   3%  45% 1.00       74 cpu[3]+0x64            ts_tick+0x15            
   28   3%  48% 1.00      238 cpu[2]+0x64            lwp_block+0x21          
   28   3%  50% 1.00      263 lwpsleepq+0x53c        ts_update_list+0x46    
   26   2%  53% 1.00      206 cpu[0]+0x64            ts_tick+0x15            
   26   2%  55% 1.00       89 cpu[2]+0x64            turnstile_block+0x1e5  
-------------------------------------------------------------------------------
[....]

        And that's where I get lost.  I don't know how to interpret the
above information.  Where would I go / what would I read to be able to
understand what causes contention on those locks, and give me ideas to
speed up my server?

        Thanks,

        Ed