Zombie die die die

Zombie die die die

Post by Enki » Fri, 25 Sep 1998 04:00:00



How in the world do Kill all these Zombies I tried  kill -9 pid#here
and they just wont die until I
reboot shutdown or at least log off

Tks
Meta

 
 
 

Zombie die die die

Post by Svein Olav Bjerkese » Fri, 25 Sep 1998 04:00:00



> How in the world do Kill all these Zombies I tried  kill -9 pid#here
> and they just wont die until I
> reboot shutdown or at least log off

> Tks
> Meta

 Zombie processes are already dead. The process does not go away
until it's parent wait(2) for it. Are your zombies by any chance childs
from some application you wrote yourself (and did not put a wait() call
in your code) ? If so you should put a signal handler for the CHLD
signal
and inside the handler put a wait(2) call.

If the zombies originates from some application to which you dont have
access to the source code, you can try the following:

    kill -CHLD <PPID>

where <PPID> is the process ID of the zombies parent process.

Regards
Svein Olav Bjerkeset

 
 
 

Zombie die die die

Post by GD » Fri, 25 Sep 1998 04:00:00



> How in the world do Kill all these Zombies I tried  kill -9 pid#here
> and they just wont die until I
> reboot shutdown or at least log off

> Tks
> Meta

a reboot is the only way to get rid of them.

GD

 
 
 

Zombie die die die

Post by brian moo » Fri, 25 Sep 1998 04:00:00


On Thu, 24 Sep 1998 17:51:05 +0000,


> How in the world do Kill all these Zombies I tried  kill -9 pid#here
> and they just wont die until I
> reboot shutdown or at least log off

You can't kill or kill() them, since they are already dead.  You
just have to wait() [or perhaps waitpid()] until they will go away.

--
Brian Moore                         | "The Zen nature of a spammer resembles
      Sysadmin, C/Perl Hacker       |  a*roach, except that the*roach
      Usenet Vandal                 |  is higher up on the evolutionary chain."
      Netscum, Bane of Elves.                   Peter Olson, Delphi Postmaster

 
 
 

Zombie die die die

Post by Jeremy Mathe » Sat, 26 Sep 1998 04:00:00




>On Thu, 24 Sep 1998 17:51:05 +0000,

>> How in the world do Kill all these Zombies I tried  kill -9 pid#here
>> and they just wont die until I
>> reboot shutdown or at least log off

>You can't kill or kill() them, since they are already dead.  You
>just have to wait() [or perhaps waitpid()] until they will go away.

The real answer to the Zombie question is: They happen when the child
exits before the parent does.  That is, when the child process dies
and the still alive parent doesn't notice that it has died.

If you think about it, this is kind of analogous to human life.  We
expect children to outlive their parents - and if this doesn't happen,
we'd expect the parents to notice and make the necessary arrangements
for the dead child.  A child not so cared for, becomes an un-dead and
spends the rest of its time haunting the lives of its unfeeling parent(s).

Back to Unix, the ways to eliminate a Zombie are:

        1) Get the parent to wait() for the child - A well-behaved
           program, such as kerneld, will do this upon receipt of a
           SIGCHLD signal, so the command (e.g.), killall -v -CHLD kerneld
           may do the trick.

        2) Kill the parent - which causes the child to be inherited by
           init, which, being a kind and considerate parent, always does
           the right thing.

 
 
 

Zombie die die die

Post by Core » Sat, 26 Sep 1998 04:00:00



> If you think about it, this is kind of analogous to human life.  We
> expect children to outlive their parents - and if this doesn't happen,
> we'd expect the parents to notice and make the necessary arrangements
> for the dead child.  A child not so cared for, becomes an un-dead and
> spends the rest of its time haunting the lives of its unfeeling parent(s).

        I just *hate* it when that happens ...
 
 
 

Zombie die die die

Post by brian moo » Sat, 26 Sep 1998 04:00:00


On Fri, 25 Sep 1998 00:25:02 GMT,



> >On Thu, 24 Sep 1998 17:51:05 +0000,

> >> How in the world do Kill all these Zombies I tried  kill -9 pid#here
> >> and they just wont die until I
> >> reboot shutdown or at least log off

> >You can't kill or kill() them, since they are already dead.  You
> >just have to wait() [or perhaps waitpid()] until they will go away.

> The real answer to the Zombie question is: They happen when the child
> exits before the parent does.  That is, when the child process dies
> and the still alive parent doesn't notice that it has died.

No, that doesn't anser the question, which is how to kill them.  That
answers an unasked question "where did they come from".

Quote:> Back to Unix, the ways to eliminate a Zombie are:

>    1) Get the parent to wait() for the child - A well-behaved
>       program, such as kerneld, will do this upon receipt of a
>       SIGCHLD signal, so the command (e.g.), killall -v -CHLD kerneld
>       may do the trick.


until they go away."

Now, perhaps it was a bit obtuse, but is is quite correct.  (And you
generally don't use wait() since its functionality is quite limited.
In most applications blocking would be a bad thing, but that's what
wait() does.)

A well-behaved program does NOT leave dead children about at all, so I
doubt SIGCHLD will do anything.  Any program that leaves debri about is
broken.

Quote:>    2) Kill the parent - which causes the child to be inherited by
>       init, which, being a kind and considerate parent, always does
>       the right thing.

Don't run software that leaves its dead children lying about.  The
corpses stack up in the process table, and eventually you will be
buried in 'em.

And then you're fork()d.

--
Brian Moore                         | "The Zen nature of a spammer resembles
      Sysadmin, C/Perl Hacker       |  a*roach, except that the*roach
      Usenet Vandal                 |  is higher up on the evolutionary chain."
      Netscum, Bane of Elves.                   Peter Olson, Delphi Postmaster

 
 
 

Zombie die die die

Post by Jeremy Mathe » Sat, 26 Sep 1998 04:00:00



...

Quote:>> The real answer to the Zombie question is: They happen when the child
>> exits before the parent does.  That is, when the child process dies
>> and the still alive parent doesn't notice that it has died.

>No, that doesn't anser the question, which is how to kill them.  That
>answers an unasked question "where did they come from".

Well, I certainly don't want to get into a pissing match with you over
this, but I think that when the typical newbie asks "How do I kill a Zombie?"
and the answer comes back "You can't kill them", then it makes sense
for them to ask "Where do they come from?"

>> Back to Unix, the ways to eliminate a Zombie are:

>>        1) Get the parent to wait() for the child - A well-behaved
>>           program, such as kerneld, will do this upon receipt of a
>>           SIGCHLD signal, so the command (e.g.), killall -v -CHLD kerneld
>>           may do the trick.


>until they go away."

You are answering this from a "C programmer" point of view, rather
than from a "system administrator" point of view.  The point being
that wait()'ing only works if you are the parent process (or the coder
of same).  The relevant question is "how can I fix it from outside?"

Quote:>Now, perhaps it was a bit obtuse, but is is quite correct.  (And you
>generally don't use wait() since its functionality is quite limited.
>In most applications blocking would be a bad thing, but that's what
>wait() does.)

>A well-behaved program does NOT leave dead children about at all, so I
>doubt SIGCHLD will do anything.  Any program that leaves debri about is
>broken.

Well, in fact, on my system, kerneld is such a program.  There is some
bug in either pppd (and friends) or in kerneld that causes it to happen,
so I perioducally have to run the ZombieReap script to keep things tidy.

Quote:>>        2) Kill the parent - which causes the child to be inherited by
>>           init, which, being a kind and considerate parent, always does
>>           the right thing.

>Don't run software that leaves its dead children lying about.  The
>corpses stack up in the process table, and eventually you will be
>buried in 'em.

Good advice - but kinda like telling a Windows user not to run Office 97
when they are staring at a Blue Screen of Death.  Not a lot of help at
that point in time...
 
 
 

Zombie die die die

Post by Enki » Sat, 26 Sep 1998 04:00:00



> On Thu, 24 Sep 1998 17:51:05 +0000,

> > How in the world do Kill all these Zombies I tried  kill -9 pid#here
> > and they just wont die until I
> > reboot shutdown or at least log off

> You can't kill or kill() them, since they are already dead.  You
> just have to wait() [or perhaps waitpid()] until they will go away.

> --
> Brian Moore                         | "The Zen nature of a spammer resembles
>       Sysadmin, C/Perl Hacker       |  a*roach, except that the*roach
>       Usenet Vandal                 |  is higher up on the evolutionary chain."
>       Netscum, Bane of Elves.                   Peter Olson, Delphi Postmaster

Ok heres more for you

Red Hat  5.0 Offical

using modemconfig control panel  and getting stuff done  and quiting  leaving
behind a python zombie

using most of the control panels leave a zombie process python
and on occaission when I use glint rpm goes to zombie after I am done
and I know NOTHING about python, Perl a little yes, C some yes but no help with
python
TKs
Metta

 
 
 

Zombie die die die

Post by Jeremy Mathe » Sat, 26 Sep 1998 04:00:00



...

Quote:>Ok heres more for you

>Red Hat  5.0 Offical

>using modemconfig control panel and getting stuff done and quiting
>leaving behind a python zombie; using most of the control panels leave
>a zombie process python and on occaission when I use glint rpm goes to
>zombie after I am done and I know NOTHING about python, Perl a little
>yes, C some yes but no help with python

As I said earlier, the only reliable way to eliminate zombies from
your system is to kill the parent of the zombie.  Have you tried this
in this case?  Does it work?
 
 
 

Zombie die die die

Post by brian moo » Sat, 26 Sep 1998 04:00:00


On Fri, 25 Sep 1998 16:05:27 +0000,


> Red Hat  5.0 Offical

> using modemconfig control panel  and getting stuff done  and quiting  leaving
> behind a python zombie

You should report that as a bug to Redhat.

Quote:> using most of the control panels leave a zombie process python
> and on occaission when I use glint rpm goes to zombie after I am done
> and I know NOTHING about python, Perl a little yes, C some yes but no help with
> python

The dead child isn't the one at fault: it's the parent.

The code for a parent to reap it's children at death is trivial and well
documented in most Unix programming books (certainly in the standard
books by Richard Stevens: he covers it in Advanced Programming in the
Unix Environment, as well as Unix Network Programming, providing examples
in both; the Perl Camel book also has a discussion on how to code it
correctly.)

If you have the source and the skill to fix it: fix it and submit the
patch (so you don't have to keep re-fixing it and save others the
hassles).  If not, submit a bug report and it will be fixed.

It is so trivial for an author to write code properly that doesn't leave
zombies, that I would be concerned about other 'features' of the
software.

--
Brian Moore                         | "The Zen nature of a spammer resembles
      Sysadmin, C/Perl Hacker       |  a*roach, except that the*roach
      Usenet Vandal                 |  is higher up on the evolutionary chain."
      Netscum, Bane of Elves.                   Peter Olson, Delphi Postmaster

 
 
 

Zombie die die die

Post by brian moo » Sat, 26 Sep 1998 04:00:00


On Fri, 25 Sep 1998 15:04:05 GMT,

Quote:> Well, I certainly don't want to get into a pissing match with you over
> this, but I think that when the typical newbie asks "How do I kill a Zombie?"
> and the answer comes back "You can't kill them", then it makes sense
> for them to ask "Where do they come from?"

But is that the 'real answer'?

The 'real answer' would be to fix the bugs, not allow them to persist or
play the "oh, just reboot!" game of Windows.  You should be able to
expect a Linux box to stay up until the CPU melts with little or no
actual work other than fixing bugs as they are discoverered: covering
them up just guarantees you work, not stability.


> >until they go away."

> You are answering this from a "C programmer" point of view, rather
> than from a "system administrator" point of view.  The point being
> that wait()'ing only works if you are the parent process (or the coder
> of same).  The relevant question is "how can I fix it from outside?"

System administrators should know enough about their system to be able to
handle the concept of wait().

Since virtually everything under Linux includes source (and everything
that I use), fixing the source is quite possible.

Quote:> Well, in fact, on my system, kerneld is such a program.  There is some
> bug in either pppd (and friends) or in kerneld that causes it to happen,
> so I perioducally have to run the ZombieReap script to keep things tidy.

The problem is -always- in the parent.   There is no way for a child's
death to be written 'wrong' that would allow a zombie.  I'd look at the
source to kerneld and how it handles SIGCHLD.

(Hint: if 20 children die at once, you only get one signal.  Unix doesn't
queue signals.)

Quote:> >Don't run software that leaves its dead children lying about.  The
> >corpses stack up in the process table, and eventually you will be
> >buried in 'em.

> Good advice - but kinda like telling a Windows user not to run Office 97
> when they are staring at a Blue Screen of Death.  Not a lot of help at
> that point in time...

But it is the answer.  If Office 97 is a buggy piece of crap, you
shouldn't run it.  It's more like responding to "it hurts when I do this"
to "then don't do that".

With Linux you have the choice: don't use buggy software or fix it.
[And, no, you don't need to be a programmer or have any programming skills
to fix bugs: report the bugs to the current maintainer.]

The bug in kerneld is because of this sloppy code:

void
handle_child(int sig)
{
        struct job *job;
        int pid;
        int status;

        if ((pid = waitpid(-1, &status, WNOHANG)) <= 0)
                return;

        for (job = job_head; job; job = job->next) {
                if (job->pid == pid) {
                        job->pid = JOB_DONE;
                        job->status = WEXITSTATUS(status);
                        /* don't break, more jobs might be waiting... */
                        DPRINT(("SIGCHLD: job (%08lx), pid=%d, status=%d\n",
                                (long)job, pid, job->status));
                }
        }

Quote:}

That will be called whenever SIGCHLD is generated.

But what happens when 2 processes exit at once?  It gets called once
because only one SIGCHLD is generated, and only one of the two is
reaped.  The other will be a zombie.

The fix?  Easy:

void
handle_child(int sig)
{
        struct job *job;
        int pid;
        int status;

        while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
            for (job = job_head; job; job = job->next) {
                    if (job->pid == pid) {
                            job->pid = JOB_DONE;
                            job->status = WEXITSTATUS(status);
                            /* don't break, more jobs might be waiting... */
                            DPRINT(("SIGCHLD: job (%08lx), pid=%d, status=%d\n",
                                    (long)job, pid, job->status));
                    }
            }
        }

Quote:}

Now it will reap ALL dead children, not just the first on a given
signal.

There is no excuse for running buggy software when you have all the tools
to fix things.

(I don't run kerneld, or I'd submit the above patch.  Feel free to
submit it yourself.)

--
Brian Moore                         | "The Zen nature of a spammer resembles
      Sysadmin, C/Perl Hacker       |  a*roach, except that the*roach
      Usenet Vandal                 |  is higher up on the evolutionary chain."
      Netscum, Bane of Elves.                   Peter Olson, Delphi Postmaster

 
 
 

Zombie die die die

Post by Christian Stiebe » Sat, 26 Sep 1998 04:00:00



> System administrators should know enough about their system to be able to
> handle the concept of wait().

Linux users are not your typical system administator.

Quote:> Since virtually everything under Linux includes source (and everything
> that I use), fixing the source is quite possible.

Indeed. But a person with "hello world" knowledge, or less, won't be
able to fix anything.

Quote:> With Linux you have the choice: don't use buggy software or fix it.

"Don't use it" may not be what Joe User wants. Maybe he really wants
to use that software, and is just looking for a way to work around the
bug until it is fixed.

Quote:> [And, no, you don't need to be a programmer or have any programming skills
> to fix bugs: report the bugs to the current maintainer.]

And wait. And until the next version is out, or a fixed binary is
sent to Joe User, what is Joe User supposed to do?

Quote:> There is no excuse for running buggy software when you have all the tools
> to fix things.

"A fool with a tool is still a fool". Now, Joe User is not a "fool",
but it fits perfectly --- even if Joe User installs all the tools one
needs to fix bugs, he still won't be able to do it. Of course he can
learn things; a few good books, a couple of years experience, and he
can fix the kerneld bug in no time --- but in general, Joe User
prefers to spend his time doing other things.

Christian

--
Christian Stieber        http://www.informatik.tu-muenchen.de/~stieber

 
 
 

Zombie die die die

Post by brian moo » Sat, 26 Sep 1998 04:00:00


On 25 Sep 1998 18:46:35 GMT,


> > System administrators should know enough about their system to be able to
> > handle the concept of wait().

> Linux users are not your typical system administator.

Some are not.  Many are.  There are far more programmers using Linux
than actual sysadmins with users to support.

Quote:> > Since virtually everything under Linux includes source (and everything
> > that I use), fixing the source is quite possible.

> Indeed. But a person with "hello world" knowledge, or less, won't be
> able to fix anything.

They can send in a bug report.

Quote:> > With Linux you have the choice: don't use buggy software or fix it.

> "Don't use it" may not be what Joe User wants. Maybe he really wants
> to use that software, and is just looking for a way to work around the
> bug until it is fixed.

But that isn't the answer presented here.  He's been told by others to
cover up the problem by rebooting or killing the parent.

Quote:> > [And, no, you don't need to be a programmer or have any programming skills
> > to fix bugs: report the bugs to the current maintainer.]

> And wait. And until the next version is out, or a fixed binary is
> sent to Joe User, what is Joe User supposed to do?

Wait?  Hell, it took me more time to go to sunsite to download kerneld
source than it did to grep for the signal handler and fix it.  And, as I
said, I don't even USE kerneld.

I should hope Redhat, which makes money selling and supporting their
software, would be able to fix such silly bugs.  Report bugs in their
software (which is what this user was complaining about) to them and
they will fix them.

Quote:> > There is no excuse for running buggy software when you have all the tools
> > to fix things.

> "A fool with a tool is still a fool". Now, Joe User is not a "fool",
> but it fits perfectly --- even if Joe User installs all the tools one
> needs to fix bugs, he still won't be able to do it. Of course he can
> learn things; a few good books, a couple of years experience, and he
> can fix the kerneld bug in no time --- but in general, Joe User
> prefers to spend his time doing other things.

Then he should report the bugs.

Unreported bugs don't get fixed.

--
Brian Moore                         | "The Zen nature of a spammer resembles
      Sysadmin, C/Perl Hacker       |  a*roach, except that the*roach
      Usenet Vandal                 |  is higher up on the evolutionary chain."
      Netscum, Bane of Elves.                   Peter Olson, Delphi Postmaster

 
 
 

Zombie die die die

Post by Kurt Wa » Sun, 27 Sep 1998 04:00:00



%On 25 Sep 1998 18:46:35 GMT,


%>

[snip pissing match]

A nozombie program is circulating on some lists I follow that you can
run in the same manner as nohup that set up the signal handling to
ignore SIGCHLD, preventing the accumulation of zombie children of
badly written programs.  It's not a solution, but it does prevent
the zombies from filling up your process table and, I suppose, in
extreme cases, from taking up all the available slots in the process
table.  I could post it here if anyone gives a whit.

[snip pissing match, part II]

Kurt
--
Linux: The little OS that could, does and will.

 
 
 

1. PC-NFS, printing, bannerpage. die.die.die

Hi, can someone tell me how I get rid of the accursed banner
page under 2.3 & PC-NFS.  Why is this bannerpage a default
in the first place?

Thanks in advance, -P.

--

Zoologiska Institutionen   | obtain a little temporary safety deserve neither
Stockholms Universitet     | liberty or safety. - Benjamin Franklin

2. Install on wd7000

3. /ethan die,die,die

4. FTAPE 2.02 problems / compile warnings

5. How are these processes called? [Was Re: Zombie die die die]

6. Troubles linking with "-g" and "libm/libXext"

7. Freeware (was Re: Die Netscape Die)

8. Linux install from Parallel Port CD-ROM

9. DSL dies=gateway dies

10. DIE "Russel", DIE!

11. CPU heat (was Re: Die Netscape Die)

12. Is there a fix for problem w vi sessions not dying when parent dies?

13. vmlinuz/lilo grrr.argh!.die.die