I've got a busy, single-application (massively multithreaded)
server that often sees spikes in system cpu util when my application
doesn't seem any busier. Say my app is running at 35% cpu and system
around 45%, sometimes system will jump to 55%-60% without any
additional users or network traffic.
So I'm trying to figure out just exactly what the kernel is
doing during these spikes. This is Solaris 8 Intel edition (x86). I
found that lockstat -kgIW and lockstat -kIW produce interesting output
about what the kernel is doing, but I see no "smoking gun."
# lockstat -kIW -D 30 iostat -c 30 2
cpu
us sy wt id
3 6 4 88
33 56 6 5
Profiling interrupt: 22425 events in 30.111 seconds (745 events/sec)
Count indv cuml rcnt nsec Hottest CPU+PIL Caller I've tried to find out what spl() does, and it appears to have I assume mutex_enter and mutex_vector_enter have to do # lockstat -D 10 iostat -c 30 2 Adaptive mutex spin: 674705 events in 30.102 seconds (22414 events/sec) Count indv cuml rcnt spin Lock Caller Adaptive mutex block: 30614 events in 30.102 seconds (1017 events/sec) Count indv cuml rcnt nsec Lock Caller Spin lock spin: 312855 events in 30.102 seconds (10393 events/sec) Count indv cuml rcnt spin Lock Caller Thread lock spin: 1079 events in 30.102 seconds (36 events/sec) Count indv cuml rcnt spin Lock Caller And that's where I get lost. I don't know how to interpret the Thanks, Ed
---------------------------------------------------------------------------
7982 36% 36% 1.00 4896876 cpu[1] (usermode)
1795 8% 44% 1.00 5019835 cpu[4] kcopy
1483 7% 50% 1.00 5055141 cpu[5] spl
1349 6% 56% 1.00 5092220 cpu[6]+11 disp_getwork
970 4% 61% 1.00 5191712 cpu[6] mutex_enter
823 4% 64% 1.00 4993304 cpu[6] mutex_vector_enter
641 3% 67% 1.00 5233819 cpu[7]+6 bcopy
254 1% 68% 1.00 4892571 cpu[0] bzero
[...deletia...]
something to do with masking interrupts -- and that it isn't even
supposed to exist/be supported in Solaris 8(!!) (Hey, this *is*
Solaris 8!) Do have have some goofy device driver? Or were the rumors
of the death of spl() just an exaggeration :)
with kernel locks. So I try this:
cpu
us sy wt id
3 6 4 87
34 54 7 5
---------------------------------------------------------------------------
438286 65% 65% 1.00 91 0xe54c4cac lwpchan_lock+0x4c
52085 8% 73% 1.00 58 freebs_lock cv_wait+0x4f
48272 7% 80% 1.00 57 freebs_lock freebs+0x92
25208 4% 84% 1.00 50 freebs_lock freebs_enqueue+0x10
[...]
---------------------------------------------------------------------------
---------------------------------------------------------------------------
23700 77% 77% 1.00 688808 0xe54c4cac lwpchan_lock+0x4c
1619 5% 83% 1.00 122756 freebs_lock cv_wait+0x4f
988 3% 86% 1.00 125525 freebs_lock freebs_enqueue+0x10
463 2% 87% 1.00 124618 freebs_lock freebs+0x92
[...]
---------------------------------------------------------------------------
---------------------------------------------------------------------------
35552 11% 11% 1.00 1466 xc_mbox_lock+0x11 lock_set_spl+0x3d
32096 10% 22% 1.00 420 cpu[1]+0x40 disp_lock_enter+0x39
32073 10% 32% 1.00 438 cpu[0]+0x40 disp_lock_enter+0x39
30673 10% 42% 1.00 445 cpu[3]+0x40 disp_lock_enter+0x39
29453 9% 51% 1.00 423 cpu[5]+0x40 disp_lock_enter+0x39
29410 9% 60% 1.00 405 cpu[6]+0x40 disp_lock_enter+0x39
28451 9% 70% 1.00 448 cpu[2]+0x40 disp_lock_enter+0x39
27312 9% 78% 1.00 432 cpu[4]+0x40 disp_lock_enter+0x39
13595 4% 83% 1.00 2036 turnstile_table+0x68c disp_lock_enter+0x39
7341 2% 85% 1.00 570 cpu[7]+0x40 disp_lock_enter+0x39
---------------------------------------------------------------------------
---------------------------------------------------------------------------
159 15% 15% 1.00 1144 turnstile_table+0x68c ts_tick+0x15
154 14% 29% 1.00 166 lwpsleepq+0x53c ts_tick+0x15
69 6% 35% 1.00 137 transition_lock ts_tick+0x15
35 3% 39% 1.00 62 lwpsleepq+0x53c turnstile_block+0x1e5
35 3% 42% 1.00 151 cpu[4]+0x64 lwp_block+0x21
34 3% 45% 1.00 74 cpu[3]+0x64 ts_tick+0x15
28 3% 48% 1.00 238 cpu[2]+0x64 lwp_block+0x21
28 3% 50% 1.00 263 lwpsleepq+0x53c ts_update_list+0x46
26 2% 53% 1.00 206 cpu[0]+0x64 ts_tick+0x15
26 2% 55% 1.00 89 cpu[2]+0x64 turnstile_block+0x1e5
---------------------------------------------------------------------------
[....]
above information. Where would I go / what would I read to be able to
understand what causes contention on those locks, and give me ideas to
speed up my server?