I am running batch jobs under NQS on a KSR1 by launching a shell and
running my program:
myscript:
#! /usr/bin/csh
cd ~myhomedirectory
myprog
The job runs okay, with 'ps' reporting the following:
F STAT UID PID PPID PRI NI RSS WCHAN .. COMMAND
80208001 S 8275 6633 6629 -13 0 0K sigsus .. csh myscript
80008001 R 8275 6635 6633 -7 0 0K - .. myprog
Sometimes, I find that the shell hangs as my program exits, meaning that
the batch queue cannot continue to run any more jobs:
F STAT UID PID PPID PRI NI RSS WCHAN .. COMMAND
80208001 I 8275 6633 6629 -13 0 0K sigsus .. csh myscript
80008101 < 8275 6635 6633 -25 -1 0K - .. <defunct>
Since I am not very experienced with unix programming, I need some help to
understand what's going on. As I understand it at the moment, the shell is
failing to catch the SIGCHLD signal that is generated when my program
exits. Is this correct? and if so, is there anything I can do to ensure
that the shell exits and allows the next batch job to run?
Any help much appreciated.
Simon
--
------------------------------------------------
Simon Gibson
tel +161 44 275 6141
fax +161 44 275 6236
http://www.cs.man.ac.uk/aig/students/gibson.html
------------------------------------------------