CICS hogging processors

CICS hogging processors

Post by Malcolm Steven » Wed, 25 Aug 1999 04:00:00



Hi

We have a rather weird problem here which IBM seem to be a bit puzzled with;
we are running:
CICS 4.2.0.2
AIX 4.3.2
sybase 11.0.3.
All of this on an RS6000 with 4 processors and 2 Gigs of RAM

Sybase is running like a dog - basically it looks like CICS is using up all
of the processor resources! And the best part is noone seems to know why. We
have outsourced this project and they are all confused and we need to move
forward, so please, if anyone out there has any suggestions please, please
let me know.

Thanx and regards
Malcolm Stevens

 
 
 

CICS hogging processors

Post by Richard D. Lath » Wed, 25 Aug 1999 04:00:00



> Hi

> We have a rather weird problem here which IBM seem to be a bit puzzled with;
> we are running:
> CICS 4.2.0.2
> AIX 4.3.2
> sybase 11.0.3.
> All of this on an RS6000 with 4 processors and 2 Gigs of RAM

> Sybase is running like a dog - basically it looks like CICS is using up all
> of the processor resources! And the best part is noone seems to know why. We
> have outsourced this project and they are all confused and we need to move
> forward, so please, if anyone out there has any suggestions please, please
> let me know.

Make yourself a command file that looks something like the following,
and call it mytrace.cmd (sorry about the line breaks ... piece this
all back together )

===
trace -a -B -T 10485670 -L 20971520 -j \
"001,002,106,10C,134,139,119,11a,16a,18,253,46,30D,30E" ; \
sleep 3 ; trcstop
===

Running this command (you'll need root access, most likely) will take
a system trace for 3 seconds or so . The "-j" says collect info for
these hook values only ... this may not be the right ones for your
instance, but these are the ones I keep lying around as a starting
point. Look in /usr/include/sys/trchkid.h for other likely victims,
noting that a hook ID of 18 actually implies hooks 180 thru 18F.

Make another batch file that looks something like the following and
call it myrpt.cmd

====
trcrpt -x -O exec=on,pid=on,cpuid=on -o foo.output
====

This command file will format a human readable (for sufficently fuzzy
values of human and readable, I suppose) version of the trace and dump
it out as foo.output.

Brace yourself ... the output file will likely be huge :-) Depending
on what hooks you've chosen to collect, the output contains "the
truth, the whole, truth, and nothing but the truth" about what's going
on in your system. Unfortunately, the nugget of information your
looking for is sometimes buried in 2.6 bazzilion lines of output <smile>,
which is why I keep the command files around ... makes it easy to
re-run the trace and add or subtract HookIDs ... and since I only end
up doing this (read: get desperate enough) once every six months or so
to perform this procedure, I always forget all the magic flags ...

Look for any obvious silliness in the report.

Last time I went thru this particular pushup, I found that the
processes that were "in a loop" had actually fallen thru a
*y-weenie race condition having to do with multi-threaed signal
handling on an SMP box on 4.3.2 .

We're still working on the E-fix for this one, BTW . So look for your
looping processes , and see if they're ignoring "billions and
billions (TM)" of SIGILLs a second.

I hope this all helps more than it hurts :-)

--
#include  <disclaimer.std>    /* I don't speak for IBM ...           */
                              /* Heck, I don't even speak for myself */
                              /* Don't believe me ? Ask my wife :-)  */


 
 
 

CICS hogging processors

Post by Malcolm Steven » Thu, 26 Aug 1999 04:00:00


Hi Richard

Thanks for the detailed reply. We'll give this a bash - perhaps you could
pass some of this on to IBM S.A? Or better yet, come over and work in a
third world country...

Thanx and regards
Malcolm Stevens



>> Hi

>> We have a rather weird problem here which IBM seem to be a bit puzzled
with;
>> we are running:
>> CICS 4.2.0.2
>> AIX 4.3.2
>> sybase 11.0.3.
>> All of this on an RS6000 with 4 processors and 2 Gigs of RAM

>> Sybase is running like a dog - basically it looks like CICS is using up
all
>> of the processor resources! And the best part is noone seems to know why.
We
>> have outsourced this project and they are all confused and we need to
move
>> forward, so please, if anyone out there has any suggestions please,
please
>> let me know.

>Make yourself a command file that looks something like the following,
>and call it mytrace.cmd (sorry about the line breaks ... piece this
>all back together )

>===
>trace -a -B -T 10485670 -L 20971520 -j \
>"001,002,106,10C,134,139,119,11a,16a,18,253,46,30D,30E" ; \
>sleep 3 ; trcstop
>===

>Running this command (you'll need root access, most likely) will take
>a system trace for 3 seconds or so . The "-j" says collect info for
>these hook values only ... this may not be the right ones for your
>instance, but these are the ones I keep lying around as a starting
>point. Look in /usr/include/sys/trchkid.h for other likely victims,
>noting that a hook ID of 18 actually implies hooks 180 thru 18F.

>Make another batch file that looks something like the following and
>call it myrpt.cmd

>====
>trcrpt -x -O exec=on,pid=on,cpuid=on -o foo.output
>====

>This command file will format a human readable (for sufficently fuzzy
>values of human and readable, I suppose) version of the trace and dump
>it out as foo.output.

>Brace yourself ... the output file will likely be huge :-) Depending
>on what hooks you've chosen to collect, the output contains "the
>truth, the whole, truth, and nothing but the truth" about what's going
>on in your system. Unfortunately, the nugget of information your
>looking for is sometimes buried in 2.6 bazzilion lines of output <smile>,
>which is why I keep the command files around ... makes it easy to
>re-run the trace and add or subtract HookIDs ... and since I only end
>up doing this (read: get desperate enough) once every six months or so
>to perform this procedure, I always forget all the magic flags ...

>Look for any obvious silliness in the report.

>Last time I went thru this particular pushup, I found that the
>processes that were "in a loop" had actually fallen thru a
>*y-weenie race condition having to do with multi-threaed signal
>handling on an SMP box on 4.3.2 .

>We're still working on the E-fix for this one, BTW . So look for your
>looping processes , and see if they're ignoring "billions and
>billions (TM)" of SIGILLs a second.

>I hope this all helps more than it hurts :-)

>--
>#include  <disclaimer.std>    /* I don't speak for IBM ...           */
>                              /* Heck, I don't even speak for myself */
>                              /* Don't believe me ? Ask my wife :-)  */