Run away number of smbd children
Jeremy Allison
jra at samba.org
Mon Nov 5 18:19:14 GMT 2007
On Mon, Nov 05, 2007 at 10:22:55AM -0800, Dave Daugherty wrote:
> Kevin:
>
> A quick update...
>
> With 32 smbtortures running on 8 clients, we have been able to reproduce
> behavior similar to this and bug 3204 on Samba 3.0.26a. We are still
> investigating it.
>
> The current - gory details from our engineer Yen Liew...
>
> I was able to repro this thousand of smbd process using 3.0.26a, I think
> this issue exist since 3.0.25, but not on 3.0.23b.
>
> Since the tdb_lock with timeout code in 3.0.23b is different from the
> implemention in 3.0.25 and 3.0.26a .
>
> In 3.0.26a, (we think) there's a bug in the tdb_chainlock_with_timeout()
> in handling the SIGALRM signal, and causes smbd process continue to call
> fcntl(), even SIGALRM is received.
>
> In tdb_chainlock_with_timeout_internal()(in lib/util_tdb.c), it setup
> the
> gotalarm_sig() signal handler, which just set static var gotalarm=1, as
> follows:
>
> tdb_chainlock_with_timeout_internal()
> { .....
> if (timeout) {
> CatchSignal(SIGALRM, SIGNAL_CAST gotalarm_sig);
> alarm(timeout);
> }
> if (rw_type == F_RDLCK)
> ret = tdb_chainlock_read(tdb, key);
> else
> ret = tdb_chainlock(tdb, key);
> if (timeout) {
> alarm(0);
> CatchSignal(SIGALRM, SIGNAL_CAST SIG_IGN)
> .....
> }
>
> In tdb_brlock()(in tdb/common/lock.c), which eventually called by
> tdb_chainlock(), called fcntl() with lck_type=F_SETLKW, in a while loop
> as
> shown:
>
> tdb_brlock()
> { ....
> do {
> ret = fcntl(tdb->fd,lck_type,&fl);
> } while (ret == -1 && errno == EINTR);
> .....
> }
> according to fcntl man page, EINTR is set when signal to be caught (ie
> SIGALRM)
> is received. So, when SIGALRM is received, signhandler in util_tdb.c is
> called
> to set gotalarm=1; and after return from the signal handler, errno=EINTR
> and
> ret=-1, and the loop continue, and causes the process to hang.
> tdb_brlock() which is waiting for the signal should either check
> gotalarm value
> or use sigsetjmp, siglongjmp to go to desired location.
>
> Tried to use sigsetjmp/siglongjmp in
> tdb_chainlock_with_timeout_internal(), the
> smbd process hang issue does resolve.
Great analysis - thanks ! I'll fix this asap.
Jeremy.
More information about the samba-technical
mailing list