[Samba] hanging smbd(s) revisited
Jeremy Allison
jra at samba.org
Tue Feb 28 20:25:07 GMT 2006
On Tue, Feb 28, 2006 at 01:30:40PM -0500, William Jojo wrote:
>
> So we've gone back to 3.0.20 and we're stable again. I should indicate that
> it's 3.0.20 with patches 9484, 9481 and 9456 to fix Win98 dir loop, excel
> shared workbook and ACLs (not necessarily in that order).
>
> Since the problem manifests in the filesystem where our Samba install is,
> and it appears to be a tdb (namely locking.tdb for fd=15, but can't identify
> the fd=3 that spins unmercifully), I'm wondering if *maybe* it could be the
> "Fix for tdb clear-if-first race condition." or some other tdb change after
> 3.0.20 that traded one bug for another? I'm guessing... :-)
Identifying that fd would be really useful.
> We upgraded from 3.0.20 to 3.0.21a for production. It never showed up in
> development for any version after 3.0.20 since we can't generate that kind
> of random load, so of course we thought everything was cool.
>
> Again, this only happens under heavy load, daily and clears up with a bounce
> of smbd. It seems to be related to a few hundred students logging off and a
> few hundred more logging on (classes are switching). Also we noticed that
> there are several hundred and in some cases a couple thousand cookie files
> being transfered around in roaming profiles per student (they were not
> redirected).
>
> We are going to start moving to 20a, then 20b, then to 21 then back to 21a
> where we started (21b did it too, haven't tried 21c yet) after another day
> or two of 3.0.20 to make sure we're not losing our mind.
I've looked over the logic for the aquiring/release of the lock
for the locking.tdb in the 3.0.21c release code - I can't see any possible
paths, error or otherwise where the lock can be left live on a
record. I'll keep looking though. When it's spinning, what is the errno that the fcntl call
returns ?
Jeremy.
More information about the samba
mailing list