> Sometimes when i am running several server applications we are developing the
> process will "hang" and will not respond to anything. I try the kill command
> passing every signal there is (including 9) but it doesn't seem to have any
> affect on the process. I usually rename the file (using mv) and ignore the
> problem as the process is eventually killed when the machine is
> restarted each week.
> How could a process act this way?
the kernel at an uninterruptable priority: for example when waiting
for a disk buffer to become free. When a signal is sent to such a
process, it can't be delivered until the resource the process was
waiting on becomes available and the process gets scheduled to run.
If/when a process is in this state, no kill signal is going to be
delivered and acted on.
So, you need to find out what event your daemon was waiting on (the
WCHAN field of the ps output would help) and take this up with the
vendor. One possibility to rule out is NFS. If your daemon talks
to a hard-mounted remote file system and the server is not responding,
the daemon may block until the server comes back. Another thing to
eliminate is a new device driver. If these are not written properly,
they might not release kernel resources such as buffers which can
cause resource starvation and may well make processes sleep (block)
forever at non-interruptable priority.
This shouldn't have any bearing on the problem at all. It's what'sQuote:> We are using many shared and static
> libraries, could this have something to do with it?
going on in the kernel that's causing the process to block.
Indeed. Maybe the process is waiting for some change of state on theQuote:> We are also using a
> messaging middleware (DEC MessageQ) that polls on a communications port -
> another possibility?
port - carrier coming up perhaps - and this either doesn't happen or
else the hardware and/or device driver fail to detect this properly.