over a hundred smbd processes running.

Michael H. Warfield mhw at wittsend.com
Sun Aug 30 16:52:28 GMT 1998


Luke Kenneth Casson Leighton enscribed thusly:
> ok, i don't have gdb or anything else.  however, smbstatus shows only
> those processes that i would expect: the six or so computers / users
> logged in.  it doesn't show up the other sixty smbd processes in a "Z"
> state that ps -aux | grep mb lists.

	Huh?!?!  "Z" state?  Oh sh*t!  Those are Zombies...

> clues, anyone?

	Yeah.  The parent process is not doing a "wait" and reaping the
exit status from childrent that have exited.  That leaves the remnant
of the child behind in the process table with a zombie status until
either the parent process dies (and either it's parent or the init
process reaps all of the childred) or the parent process gets around
to waiting on the child and collecting that exit status.

	They are not real processes any more.  The only thing left
behind is the process id (which can not be reused until the child is
reaped) and it's exit condition.  All memory, file descriptors, and other
resources have been released back to the OS.  They still show up in
the ps list because they still occupy process slots in the process table
(and you really DO want to know what is filling up your process table).
You can't kill them or do anything with them, they're just occupying a slot.
Occupy enough process slots and the OS can start getting into trouble
trying to alocate new processes.

	I would guess that smbstatus is only reporting on the "live" smb
processes, which these most definitely are not.  That would explain your
symptoms with smbstatus and ps.

	What you need to do is find out why the smb parent process is not
doing a wait (or wait_pid) on the terminated child.  That's often done
in a signal handler based on a SIGCHLD (death of a child) signal.  There
are some timing issues with that and depending on if you use "signal" or
"sigaction", you may have conditions under which one child exits while you
are processing the signal from a previous child exit.  When that happens,
you may "lose" a signal.  I always do my "waits" in non-blocking mode in
a while loop in the signal handler to insure that I have picked up ALL
of the exiting children, whether I got a SIGCHLD from them or not.  When
I get an error back indicating that no more children are waiting to be
reaped, I rearm the signal and exit the handler.

	You can get into real trouble, too, if there is any path where the
signal handler might fail to get rearmed...  Everything runs along just
great until that magic moment and then zombies start piling up like cord wood.

> On Sat, 29 Aug 1998, Luke Kenneth Casson Leighton wrote:
> 
> > ok, it's happened again: this time i'll try a little investigating.
> > 
> > On Thu, 27 Aug 1998, Luke Kenneth Casson Leighton wrote:
> > 
> > > latest cvs version.  killed them immediately as the performance of the
> > > freebsd machine was getting unstable.  anyone else see this?

	My guess is that once you killed the parent smb process, all of the
zombies were reaped by init and the process table was cleaned up...

	Mike
-- 
 Michael H. Warfield    |  (770) 985-6132   |  mhw at WittsEnd.com
  (The Mad Wizard)      |  (770) 925-8248   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9      |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471    |  possible worlds.  A pessimist is sure of it!


More information about the samba-technical mailing list