Samba eating processes

Sat May 9 23:56:41 GMT 1998

vince at rti.com (Vince Chen) wrote:

> With the upgrade to 18p4, samba is regularly causing our server to
> run out of processes.
> 
> What is causing this?  It never happened w/ 17p2.
>
> OS: IRIX 6.2

[log deleted see digest 1681]

The problem with a large number of processes mat or may not be
related to the "recursion" error messages in you post.  Dealing
first with the recursion issue:

There is a unix signal called SIGCHLD which is sent to a process
whenever one of that processes children dies.  In the case of
samba it will be sent to the master smbd process each time one of
the slave smbd processes dies, i.e. each time a connection to the
samba server is closed.  The master smbd deals with the signal
by installing a signal handler for it, i.e. a function that is
called whenever the signal is received.

Another feature of unix systems in general is one of zombie processes.
When a process dies the kernel frees all the resources used by the
process (memory, open files, etc.) except for the process table entry
(which is what appears in the 'ps' listing).  At this stage the process
has become a zombie and it will remain in this state until the parent
process collects the child process's exit status by calling one of the
wait functions (wait(2), waitpid(2), wait3(2)) at which point the
process table entry is also freed and the process disapears.

Now unix systems don't normally queue signals and this is also
true of the SIGCHLD signal.  On SYSV systems this signal is generated
for a process whenever:

    1.  A child process dies when there is a signal handler installed
	for SIGCHLD.

    2.  A handler for SIGCHLD is installed when there is already a
	dead child process whose exit status could be collected by
	calling a wait function (a zombie process).

Using the old SYSV signal semantics singal handlers have to be
re-installed once they have been used to the handler for SIGCHLD has
to be re-installed within the handler for SIGCHLD.  If this is done
before one of the wait functions is called then the SIGCHLD signal
we be regenerated for the _same_ child process time and time again
each time calling the signal handler again.  This is the recursion
that samba has detected in your configuration.  The solution is to
call one of the wait functions before the signal handler is
re-installed which is one of the effects of defining the USE_WAITPID
conditional compilation #define.

The way in which this could cause a large number of child processes
is that the code in samba signal handler that detects the recursion
aborts the signal handler at that point meaning that samba now doesn't
have a signal handler installed for SIGCHLD so it never calls one of
the wait functions for any of its children that die from that point on
so they all remain as zombies.

The other way a large number of processes can be created appears
to be related to NT and we have experienced this with NTrigue
(based on NT 3.51) and samba 18p3 and 18p4.

What appears to be happenning is that NT gives up on a particular
operation on the samba server and retries the request by openning
a new connection.

A total guess as to why the problem appears in the release 18 but not
in 17 would be that 18 has implemented some NT specific functionality
but not quite in the way that NT expects hence NT retring.

So my recommendation would be to re-compile using the USE_WAITPID
option and see if the problem goes away.

-- 
Steve Fosdick                  Internet: fosdicsj at aom.bt.co.uk
Voice: +44 1473 642987         MSMAIL:   BTEA/BTLIP23/FOSDICSJ
Fax:   +44 1473 646656         BOAT:     FOSDICSJ
Snail: B29/G34, BT Labs, Martlesham Heath, Ipswich, IP5 7RE, England.