There has been some confusion about differences between Linux and Solaris
with regard to SIGCHLD handling for stopped children. I will give an
overview about what the differences are and how code should be written
to be portable.
|> > Actually when the child gets a signal stopping it you get SIGCHLD
|> > (asumming` you are not ignoring that signal of course).
|>
|> Solaris claims that it is POSIX compliant, but on Solaris
|> I couldn't observe the same behavior (i.e. the parent does
|> not get SIGCHLD).
|>
|> I don't have a POSIX spec handy. Do you know what POSIX
|> says ? Is Linux Posix-compliant in this regard ?
|>
|> > You can tell
|> > the different by testing the result of wait with WIFSTOPPED (check
|> > the header file but I think that the right number).
|>
|> I tried. Here is what I got:
|>
|> 1) wait() returns after a short time
|> 2) the wait status tells me WIFSTOPPED(status) == 0
|> in both cases (child stopped or child killed).
|>
|> And I don't understand why wait() does return even though
|> the child does not exit.
|>
|> The man page for wait() seems to indicate that:
|>
|> The wait function suspends execution of the current pro-
|> cess until a child has exited, or until a signal is deliv-
|> ered whose action is to terminate the current process or
|> to call a signal handling function.
|>
|> In this case the child process is just sleeping, so I
|> do not understand why wait() does not block.
|>
|> In any case, the status returned by wait(&status) does not seem
|> to give me any indication, since WIFSTOPPED is zero
|> in both cases (child stopped or child killed with signal KILL).
|>
|> The only difference that I could find in the status
|> between those two cases (child stopped or child killed)
|> is the value of WSTOPSIG(), but the man page
|> says that this value should be evaluated only when WIFSTOPPED
|> returns non-zero, which is not the case.
The POSIX.1 function to set up signal handlers is `sigaction()'. This
function takes a pointer to a structure which contains an element
`sa_flags'. When the signal handler for SIGCHLD is established, the
handling of signals for stopped child processes can be controlled by
the flag SA_NOCLDSTOP. If this flag is not set, a stopped child causes
a SIGCHLD sent to the parent; if this flag is set, the parent does not
get the signal. So far, this is entirely portable between all POSIX.1
systems.
On the other side, the C standard function to set up a signal handler
is called `signal()'. This function is, at least in its historical
implementation, quite limited; BSD and SYSV extended it in incompatible
ways, so POSIX replaced it by `sigaction()' and placed very few
restrictions on the implementation of `signal()'. At least in Linux,
`signal()' is implemented as a wrapper around `sigaction()', setting
the `sa_flags' argument to SA_INTERRUPT|SA_NOMASK|SA_ONESHOT. Note that
SA_NOCLDSTOP is missing! This is why you get a SIGCHLD when you set up
the signal handler with `signal()' and a child process stops. Obviously,
Solaris seems to set SA_NOCLDSTOP in the flags, so you do not get signals
when child processes stop.
From the side of the implementation, the behaviour of Solaris seems more
intuitive, so it would be reasonable to add the SA_NOCLDSTOP flag to the
`signal()' implementation of the Linux C library, though the current
behaviour is perfectly POSIX compliant.
From the side of the application programmer, use `sigaction()' to get
portable behaviour.
Now with respect to the result of `wait()': `wait()' does not return
a status for stopped processes (which is one reason `signal()' should
be changed). Instead, `wait()' hangs until a process terminates or a
signal is received. The description of WIFSTOPPED not set is probably
the result of the evaluation of an invalid status code from `wait()'.
Use `waitpid(-1, &stat_val, WNOHANG|WUNTRACED)' in a loop inside the
SIGCHLD handler to get status of stopped children too.