Hi. I am developing on various *NIX's using sockets for IPC.
I have run into a problem with all of them (specifically, IRIX 5.2
HP-UX A.09.01, Solaris 2.3, SunOS 4.1.3), except FreeBSD 2.0
(hence the cross-posts).
Summary of Project:
The spawner sits in a loop, connecting (as a client) to a database server
(which we also wrote) using TCP sockets. It gets info from the server about
jobs needing to run (kinda like "at"), and forks a copy of itself to run them
(using "exec") . It traps SIGCHLD (using "signal" or "sigset" as apropos)
and uses "waitpid" to get the exit status of the child.
The problem:
When the SIGCHLD occurs during the middle of a socket read or write,
on everything execept FreeBSD, the program looses data that is being
sent over the socket. This results in garbage information (except on
IRIX which results in a seg-fault [untracable]). When I do not trap
the signal (using "sigignore") everything works fine.
Additional Info:
I wrote a test suite which is a simplified version of the above.
It shows the same garbage results on Solaris 2.3, but works fine
on IRIX or HP-UX. Needless to say, it works fine on FreeBSD.
I've tried running things in the de*, but it (gdb) gets confused
when it (the program) does the context switch for the signal, after which
I can't get any useful information.
Does anyone out there have ANY ideas about this?
Reply by email or follow-up to comp.protocols.tcp-ip
Thanx in advance.
-coranth
----------------------------------------------------+----------------------
| need-to-know basis.
USMail: MSG,Inc., 10 Corporate Place |
Burlington, MA 01803-5168 | You do not currently
Phone: (617) 273-2820 FAX: (617) 272-1068 | need to know.
Disclaimer: They would never acknowledge I said it. +-----------------------+