[Samba] hanging smbd(s) revisited

Jeremy Allison jra at samba.org
Tue Feb 28 20:25:07 GMT 2006


On Tue, Feb 28, 2006 at 01:30:40PM -0500, William Jojo wrote:
> 
> So we've gone back to 3.0.20 and we're stable again. I should indicate that
> it's 3.0.20 with patches 9484, 9481 and 9456 to fix Win98 dir loop, excel
> shared workbook and ACLs (not necessarily in that order).
> 
> Since the problem manifests in the filesystem where our Samba install is,
> and it appears to be a tdb (namely locking.tdb for fd=15, but can't identify
> the fd=3 that spins unmercifully), I'm wondering if *maybe* it could be the
> "Fix for tdb clear-if-first race condition." or some other tdb change after
> 3.0.20 that traded one bug for another? I'm guessing... :-)

Identifying that fd would be really useful.

> We upgraded from 3.0.20 to 3.0.21a for production. It never showed up in
> development for any version after 3.0.20 since we can't generate that kind
> of random load, so of course we thought everything was cool.
> 
> Again, this only happens under heavy load, daily and clears up with a bounce
> of smbd. It seems to be related to a few hundred students logging off and a
> few hundred more logging on (classes are switching). Also we noticed that
> there are several hundred and in some cases a couple thousand cookie files
> being transfered around in roaming profiles per student (they were not
> redirected).
> 
> We are going to start moving to 20a, then 20b, then to 21 then back to 21a
> where we started (21b did it too, haven't tried 21c yet) after another day
> or two of 3.0.20 to make sure we're not losing our mind.

I've looked over the logic for the aquiring/release of the lock
for the locking.tdb in the 3.0.21c release code - I can't see any possible 
paths, error or otherwise where the lock can be left live on a
record. I'll keep looking though. When it's spinning, what is the errno that the fcntl call
returns ?

Jeremy.


More information about the samba mailing list