A process that cannot be killed by a kill -9 is in
kernel mode and will not receive the signal sent by
the kill command until it returns. This may be a bug
in your level of kernel. It may not be. Either way,
a command should return to user mode at some point.
Usually, system calls don't take very long, unless the
system is overloaded.
This is the process that returns on its own
42250 1 0 Dec 31 - 0:00 
46156 42250 0 0:00 <defunct>
The only way to get rid of it is to ipl the system
>A process that cannot be killed by a kill -9 is in
>kernel mode and will not receive the signal sent by
>the kill command until it returns. This may be a bug
>in your level of kernel. It may not be. Either way,
>a command should return to user mode at some point.
>Usually, system calls don't take very long, unless the
>system is overloaded.
: This is the process that returns on its own
: 42250 1 0 Dec 31 - 0:00 
: 46156 42250 0 0:00 <defunct>
: The only way to get rid of it is to ipl the system
I know. Sick isn't it? :)
I've seen this problem in two different incarnations.
The first case was with a multithreaded application which
would hang. When the application administrator would attempt
to kill the process, it would appear in the process table as '',
as you are seeing. (The reason for the defunct process in your
case is that the child process (46156) has exited and is waiting
for the parent process (42250) to issue a wait() on it. Since
it is too busy thinking that something in kernel mode is really
interesting, it's not able to wait() on the child process and
reap its entry from the process table.) This case was fixed
by applying bos.[um]p.220.127.116.11 (iirc). I can find the exact
APAR for you, if you're interested.
In the second case, a process would exit and upon doing so
all of its open sockets would have to be closed. It would
immediately appear in the process table with a name of ''.
The system response time became very slow. ps alxw | sort +5 -n
would show this process names '' as being the culprit (look
in the C column, which is the pentaly value assigned to the
process for recent cpu usage). vmstat showed that the system
was spending up to 95% of the cpu time in sys (kernel) mode.
This turned out to be a bug in the kernel where the flawed
logic of a while loop in the kernel socket code resulted
in the kernel looping infinitely. There is an efix available
for this problem at 4.3.2, but the official APAR is only going
to be released at 4.3.3, as the code drop deadline for 4.3.2
APARS had already passed by the time we tracked down the bug.
If you talk to support about the process named ''. they may
give you some hooha from the ps(1) man page about this being
a process that is *about* to go defunct. Don't believe
it. It's plain wrong so far as I can tell (as least
in a case such as yours). It probably applies to a case
where an entry in the process table appears something
like '[command]' -- as I've seen that a few times when
a normally functioning process exits (e.g., something
like '[telnetd]'), but not to a case where there's nothing
between the brackets.
If you're not using AIX 4.3.2, the problem may be something
a bit different, or it could be fixed by the same APAR(s)
which may be released for previous (still supported) versions of
AIX if the faulty code exists in them as well. What level
of AIX is this?
I'd be interested to know which process this is. Whether
you know or not isn't of much consequence; my money's on
the problem being a kernel bug anyway.
The difficulty in working with support on a case like
this is that there's no quick fix. You can't force a dump, as
they'll tell you that the system wasn't hung and they (usually)
won't look at a forced dump on a non-hanged system. There's
good reasons for that -- they wouldn't *necssarily* be able to
find out what that one process is doing in the kernel as someone
else may be using the cpu(s) at the time you force the dump.
You'll probably have to talk to the kernel group about it.
if the person you talk to can't come up with a better plan
of action than rebooting, request that they requeue the
PMR to the next level up as rebooting -- at least in my book --
is no plan of action at all.
I am using Linux at work, and I am trying to install it at home.
My setup is : 386sx16, 8mb, 120 mb HD(IDE), VGA. The problem is that
when the boot-disk has ended it`s operation and I have switched to the
root-disk, all the system says is : Disk Change Detected. And there it
I have used rawrite to place color.gz on my rootdisk. And I have tried
the same disk at work (on a P90) where it works fine.
This is a mystery to me, but maybe some of you have encountered and solved
this problem. I really hope so.
I am using Slackware 3.0
P?l Martin Bakken.