I walked into the middle of a particularly * mess of socket code while
maintaining a client's application.
Occasionally, the production applications (Apple's WebObjects
middleware apps served via the web) "spin". That is, their CPU usage goes
to 100% and the app stops responding to all requests.
Unfortunately, core files are useless in the current environment
[on my todo list, trust me] and gdb is not installed on the production
"pstack" indicates that the app is typically spinning in ioctl().
- on almost every request, the app opens yet-another-TCP/IP stream or two
or three or four or five to grab more information. All streams SHOULD be
closed by the time the response is generated.
- some of the read-data-from-the-stream code is really ineffecient. For
example, some of it looks like:
read(socket, ... args to read ONE BYTE AT A TIME ...);
while (bufPtr != \n);
i.e. some of the read code actually reads a single byte at a time from the
receive buffer. I'm not totally sure of the internals of how read()
works, but this strikes me as being asoundingly ineffecient.
- The ioctl() call that caused the most recent spin was:
if (ioctl(curQCore->socket, I_NREAD, &avail) < 0) return NULL;
I don't know-- that's why I'm asking the community at large... :-)
To give an idea of the scope of an answer that I could believe, some
- it may be that a patch is required? Where there any patches released in
the last year that fix problems with the socket stuff on Solaris?
- could it simply be that all the opening/closing of the data streams is
enough to cause the socket substrate to explode under high load
- could an additional stress factor be the presence of the
read-a-byte-at-atime style of grabbing data out of the receive buffer?