Problem with status value returned by waitpid

Problem with status value returned by waitpid

Post by Christi » Sun, 30 Mar 2003 01:52:43



Hello guys,

I'm having a problem with the value of myStat returned by waitpid(
pid, &myStat, 0). The child exits with a not null value but i get
myStat = 0.

Here is the C code (tests for errors removed):

-------------------------------------------------
pipe( pfd );
pid = fork();
if( pid == 0 ) {
  dup2( pfd[1], STDOUT_FILENO );
  dup2( pfd[1], STDERR_FILENO );
  close( pfd[0] );
  close( pfd[1] );
  exitcode = system( command );
  exitcode = exitcode >> 8;
  exit( exitcode );

Quote:}

else {
  close( pfd[1] );
  read( pfd[0], ... ); /* reading data returned by exec of command */

Quote:}

close( pfd[0] );
signal( SIGCHLD, SIG_DFL );
waitpid( pid, &status, 0 );
-------------------------------------------------

command is a Bourne shell script :
#!/bin/sh
exit 5

When logging exitcode value in the child, i do get 5 but status in the
parent is 0.

But if I use the following piece of code, it works fine :

-------------------------------------------------
pipe( pfd );
pid = fork();
if( pid == 0 ) {
  for( fd = 0; fd < 31; fd++ )
    if( fd != pfd[1] ) close( fd );

  dup( pfd[1] );
  dup( pfd[1] );
  dup( pfd[1] );
  close( pfd[1] );

  exitcode = system( command );
  exitcode = exitcode >> 8;
  exit( exitcode );

Quote:}

else {
  read( pfd[0], ... ); /* reading data returned by exec of command */

Quote:}

close( pfd[0] );
close( pfd[1] );
signal( SIGCHLD, SIG_DFL );
waitpid( pid, &status, 0 );
-------------------------------------------------

I guess it has to do with the use of dup2 but i can't figure out why.
If anyone has any idea of what's wrong with my code .....

Thanks.

Christian

 
 
 

Problem with status value returned by waitpid

Post by Jens.Toerr.. » Sun, 30 Mar 2003 03:02:29



> Hello guys,
> I'm having a problem with the value of myStat returned by waitpid(
> pid, &myStat, 0). The child exits with a not null value but i get
> myStat = 0.
> Here is the C code (tests for errors removed):
> -------------------------------------------------
> pipe( pfd );
> pid = fork();
> if( pid == 0 ) {
>   dup2( pfd[1], STDOUT_FILENO );
>   dup2( pfd[1], STDERR_FILENO );
>   close( pfd[0] );
>   close( pfd[1] );
>   exitcode = system( command );
>   exitcode = exitcode >> 8;
>   exit( exitcode );
> }
> else {
>   close( pfd[1] );
>   read( pfd[0], ... ); /* reading data returned by exec of command */
> }
> close( pfd[0] );
> signal( SIGCHLD, SIG_DFL );
> waitpid( pid, &status, 0 );
> -------------------------------------------------
> command is a Bourne shell script :
> #!/bin/sh
> exit 5
> When logging exitcode value in the child, i do get 5 but status in the
> parent is 0.

Why do you call signal() for SIGCHLD with SIG_DFL? The default
action is to ignore SIGCHLD. Do you had it set to something else
before? And what's the return value of waitpid()? Is it really
identical to the pid of the child? Otherwise it's no big surprise
'status' doesn't change.

I also can't see what you try to read() in the parent. Could you
please post a compilable program, there are too many strange
things in your program (and too many things you don't show) to
be able to give you a satisfactory answer. All I can tell is
that IMHO it's rather unlikely that this got anything to do
with using dup2() instead of dup(), probably it's a timing
problem.
                                     Regards, Jens
--
      _  _____  _____

  _  | |  | |    | |
 | |_| |  | |    | |          http://www.physik.fu-berlin.de/~toerring
  \___/ens|_|homs|_|oerring

 
 
 

Problem with status value returned by waitpid

Post by Christi » Tue, 01 Apr 2003 17:58:01




> > Hello guys,

> > I'm having a problem with the value of myStat returned by waitpid(
> > pid, &myStat, 0). The child exits with a not null value but i get
> > myStat = 0.

> > Here is the C code (tests for errors removed):

> > -------------------------------------------------
> > pipe( pfd );
> > pid = fork();
> > if( pid == 0 ) {
> >   dup2( pfd[1], STDOUT_FILENO );
> >   dup2( pfd[1], STDERR_FILENO );
> >   close( pfd[0] );
> >   close( pfd[1] );
> >   exitcode = system( command );
> >   exitcode = exitcode >> 8;
> >   exit( exitcode );
> > }
> > else {
> >   close( pfd[1] );
> >   read( pfd[0], ... ); /* reading data returned by exec of command */
> > }

> > close( pfd[0] );
> > signal( SIGCHLD, SIG_DFL );
> > waitpid( pid, &status, 0 );
> > -------------------------------------------------

> > command is a Bourne shell script :
> > #!/bin/sh
> > exit 5

> > When logging exitcode value in the child, i do get 5 but status in the
> > parent is 0.

> Why do you call signal() for SIGCHLD with SIG_DFL? The default
> action is to ignore SIGCHLD. Do you had it set to something else
> before? And what's the return value of waitpid()? Is it really
> identical to the pid of the child? Otherwise it's no big surprise
> 'status' doesn't change.

> I also can't see what you try to read() in the parent. Could you
> please post a compilable program, there are too many strange
> things in your program (and too many things you don't show) to
> be able to give you a satisfactory answer. All I can tell is
> that IMHO it's rather unlikely that this got anything to do
> with using dup2() instead of dup(), probably it's a timing
> problem.
>                                      Regards, Jens

Thanks for the reply Jens. Sorry about the first posting, I wanted to
post it before the week-end on Friday, but I went a bit too fast
writing it.

Here is the C code and the shell script executed by the child :

----------------- code.c ------------------------

#include <stdlib.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>

int main() {
  int pfd[2], nbc, pid, status = 0, exitcode = 0;
  char buf[4098];

  pipe( pfd );

  signal( SIGCHLD, SIG_IGN );

  switch( (pid = fork()) ) {

    case 0 : /* We are in the child */

      dup2( pfd[1], STDOUT_FILENO );
      dup2( pfd[1], STDERR_FILENO );
      close( pfd[0] );
      close( pfd[1] );

      exitcode = system( "/tmp/myshell.sh" );
      exitcode = (exitcode&0xff00) >> 8;
      exit( exitcode );
      break;

    default :
      close( pfd[1] );
      for(;;)
      {
        nbc = read( pfd[0], buf, 4096 );
        if( nbc <= 0 ) break;
      }
  }

  close( pfd[0] );

  signal( SIGCHLD, SIG_DFL );

  waitpid( pid, &status, 0 );

  printf( "In the parent, status = %d\n", status );

  return 0;

Quote:}

----------------- myshell.sh ------------------------

#!/bin/sh
echo " "
exit 5

The problem arises with the use of the signal() system calls. The man
pages say that 'set the action for SIGCHLD to SIG_IGN is not allowed
by POSIX' and 'it is unspecified what happens when SIGCHLD is set to
SIG_IGN'.
If i remove both calls to signal(), it works fine. So case closed.

Though, i came across another issue : if close( pfd[1] ) is not
performed before the reading, read() hangs after successfully reading
data echo'ed by myshell.sh, as if it can't detect the end of file.

Any idea ?

Christian.

 
 
 

Problem with status value returned by waitpid

Post by Kurtis D. Rade » Wed, 02 Apr 2003 14:24:34



> The problem arises with the use of the signal() system calls. The man
> pages say that 'set the action for SIGCHLD to SIG_IGN is not allowed by
> POSIX' and 'it is unspecified what happens when SIGCHLD is set to
> SIG_IGN'.  If i remove both calls to signal(), it works fine. So case
> closed.

Of course. When you set SIGCHLD to SIG_IGN you're telling the operating
system you aren't interested in the exit status of your child processes. So
under those conditions waitpid() is not only not required to return the
exit status of the process it can even return -1/ESEARCH. The reason for
the wording of the behavior you quote above is to allow for historical
behavior.

Quote:> Though, i came across another issue : if close( pfd[1] ) is not performed
> before the reading, read() hangs after successfully reading data echo'ed
> by myshell.sh, as if it can't detect the end of file.

That's because EOF hasn't occurred. Note that without the close(pfd[1]) the
write side of the pipe is still open in the parent.  While the child is
still running there are two processes able to write to the pipe so the
reference count on the write side of the pipe is two.  When the child exits
it drops to one. The OS won't signal EOF to a read() until the reference
count goes to zero.

N.B.: You should never use the signal() system call. It's historical
behavior leads to race conditions that can result in random failures of
your program. Always use sigaction() or, if that isn't available, sigset().

 
 
 

Problem with status value returned by waitpid

Post by Christia » Wed, 02 Apr 2003 17:05:42





> > The problem arises with the use of the signal() system calls. The man
> > pages say that 'set the action for SIGCHLD to SIG_IGN is not allowed by
> > POSIX' and 'it is unspecified what happens when SIGCHLD is set to
> > SIG_IGN'.  If i remove both calls to signal(), it works fine. So case
> > closed.

> Of course. When you set SIGCHLD to SIG_IGN you're telling the operating
> system you aren't interested in the exit status of your child processes.
So
> under those conditions waitpid() is not only not required to return the
> exit status of the process it can even return -1/ESEARCH. The reason for
> the wording of the behavior you quote above is to allow for historical
> behavior.

> > Though, i came across another issue : if close( pfd[1] ) is not
performed
> > before the reading, read() hangs after successfully reading data echo'ed
> > by myshell.sh, as if it can't detect the end of file.

> That's because EOF hasn't occurred. Note that without the close(pfd[1])
the
> write side of the pipe is still open in the parent.  While the child is
> still running there are two processes able to write to the pipe so the
> reference count on the write side of the pipe is two.  When the child
exits
> it drops to one. The OS won't signal EOF to a read() until the reference
> count goes to zero.

> N.B.: You should never use the signal() system call. It's historical
> behavior leads to race conditions that can result in random failures of
> your program. Always use sigaction() or, if that isn't available,
sigset().

Thanks a lot for the reply Kurtis.

Christian.