> We have a rather weird problem here which IBM seem to be a bit puzzled with;
> we are running:
> CICS 220.127.116.11
> AIX 4.3.2
> sybase 11.0.3.
> All of this on an RS6000 with 4 processors and 2 Gigs of RAM
> Sybase is running like a dog - basically it looks like CICS is using up all
> of the processor resources! And the best part is noone seems to know why. We
> have outsourced this project and they are all confused and we need to move
> forward, so please, if anyone out there has any suggestions please, please
> let me know.
Make yourself a command file that looks something like the following,
and call it mytrace.cmd (sorry about the line breaks ... piece this
all back together )
trace -a -B -T 10485670 -L 20971520 -j \
"001,002,106,10C,134,139,119,11a,16a,18,253,46,30D,30E" ; \
sleep 3 ; trcstop
Running this command (you'll need root access, most likely) will take
a system trace for 3 seconds or so . The "-j" says collect info for
these hook values only ... this may not be the right ones for your
instance, but these are the ones I keep lying around as a starting
point. Look in /usr/include/sys/trchkid.h for other likely victims,
noting that a hook ID of 18 actually implies hooks 180 thru 18F.
Make another batch file that looks something like the following and
call it myrpt.cmd
trcrpt -x -O exec=on,pid=on,cpuid=on -o foo.output
This command file will format a human readable (for sufficently fuzzy
values of human and readable, I suppose) version of the trace and dump
it out as foo.output.
Brace yourself ... the output file will likely be huge :-) Depending
on what hooks you've chosen to collect, the output contains "the
truth, the whole, truth, and nothing but the truth" about what's going
on in your system. Unfortunately, the nugget of information your
looking for is sometimes buried in 2.6 bazzilion lines of output <smile>,
which is why I keep the command files around ... makes it easy to
re-run the trace and add or subtract HookIDs ... and since I only end
up doing this (read: get desperate enough) once every six months or so
to perform this procedure, I always forget all the magic flags ...
Look for any obvious silliness in the report.
Last time I went thru this particular pushup, I found that the
processes that were "in a loop" had actually fallen thru a
*y-weenie race condition having to do with multi-threaed signal
handling on an SMP box on 4.3.2 .
We're still working on the E-fix for this one, BTW . So look for your
looping processes , and see if they're ignoring "billions and
billions (TM)" of SIGILLs a second.
I hope this all helps more than it hurts :-)
#include <disclaimer.std> /* I don't speak for IBM ... */
/* Heck, I don't even speak for myself */
/* Don't believe me ? Ask my wife :-) */