[Samba] hanging smbd(s) revisited
jojowil at hvcc.edu
Wed Mar 1 16:20:07 GMT 2006
----- Original Message -----
From: "William Jojo" <jojowil at hvcc.edu>
To: "Jeremy Allison" <jra at samba.org>
Cc: <samba at lists.samba.org>; "Gerald (Jerry) Carter" <jerry at samba.org>;
"Andrew Tridgell" <tridge at samba.org>; "Jeremy Allison" <jra at samba.org>
Sent: Tuesday, February 28, 2006 4:33 PM
Subject: Re: [Samba] hanging smbd(s) revisited
> ----- Original Message -----
> From: "Jeremy Allison" <jra at samba.org>
> To: "William Jojo" <jojowil at hvcc.edu>
> Cc: <samba at lists.samba.org>; "Gerald (Jerry) Carter" <jerry at samba.org>;
> "Andrew Tridgell" <tridge at samba.org>; "Jeremy Allison" <jra at samba.org>
> Sent: Tuesday, February 28, 2006 3:25 PM
> Subject: Re: [Samba] hanging smbd(s) revisited
> > On Tue, Feb 28, 2006 at 01:30:40PM -0500, William Jojo wrote:
> > >
> > > So we've gone back to 3.0.20 and we're stable again. I should indicate
> > > it's 3.0.20 with patches 9484, 9481 and 9456 to fix Win98 dir loop,
> > > shared workbook and ACLs (not necessarily in that order).
> > >
> > > Since the problem manifests in the filesystem where our Samba install
> > > and it appears to be a tdb (namely locking.tdb for fd=15, but can't
> > > the fd=3 that spins unmercifully), I'm wondering if *maybe* it could
> > > "Fix for tdb clear-if-first race condition." or some other tdb change
> > > 3.0.20 that traded one bug for another? I'm guessing... :-)
> > Identifying that fd would be really useful.
> Ok, dug it up. This is the IBM info.
> ----- Original Message -----
> From: Robert Elias
> To: jojowil at hvcc.edu
> Sent: Monday, February 27, 2006 12:30 PM
> Subject: Pmr#47402,180
> Thank you for patience while I work through your questions. I ran this
> by our level 3 performance team and received the following input.
> The file in question is inode 12363 in /samba. Use 'find /samba -inum
> to determine the file name.
> I ran this by the Samba team members that work for IBM and they suggested
> the following:
> As a long shot, I suggest that you have him run tdbtorture (a file i/o
> testcase) from the samba source tree as that does a simulation of the
> locking that Samba does and if we have a bug in AIX locking.
> Your comments or thoughts?
> Robert Elias
> AIX Duty Manager
> IBM Integrated Technology Services
> 214-257-9292 - T/L 972
> [storage:/samba/3.0.21b] # find /samba -inum 12363
> > > We are going to start moving to 20a, then 20b, then to 21 then back to
> > > where we started (21b did it too, haven't tried 21c yet) after another
> > > or two of 3.0.20 to make sure we're not losing our mind.
> > I've looked over the logic for the aquiring/release of the lock
> > for the locking.tdb in the 3.0.21c release code - I can't see any
> > paths, error or otherwise where the lock can be left live on a
> > record. I'll keep looking though. When it's spinning, what is the errno
> that the fcntl call
> > returns ?
> What appears to happen is pid 266946 is exiting (exited?) and some kind of
> dealock has occured which shows the following in filemon.sum from the
> perfpmr that IBM had me run during the event.
> 9603204 hooks processed (incl. 2108 utility)
> 60.013 secs in measured interval
> Cpu utilization: 42.9%
> Most Active Files
> #MBs #opns #rds #wrs file volume:inode
> 230.1 0 29492 0 pid=266946_fd=3
> 43.3 0 1588 129 pid=240270_fd=5
> My question to IBM was how can this happen? The above inode number is what
> was provided to me yesterday.
> Since moving to 3.0.20 the problem has subsided, I'm back here and not
> bugging IBM at the moment. :-|
> Whatever else I can get you, just say the word. :-)
> Do you agree with us to step to 20a, 20b ... ?
We've survived two days on 3.0.20, and our load is even more than when we
started. We have over 1000 smbd's running on this machine and it's not even
breaking a sweat.
Now additonally, I'm looking through source/locking/locking.c I notice that
diff of 3.0.20 and 20a and 20b have no changes. Then in 3.0.21 there's an
invasive change. (locking/posix.c remains unchanged through 21b.)
I'm pretty certain that 20a and 20b will be fine for us based on what I see,
but I'm still learning (and comprehending :-) ) these changes looking for a
smoking gun. And tomorrow I will put 20b (skipping 20a) in place on this
I'm opening a bug because I think this one is real and load related.
> > Jeremy.
> > --
> > To unsubscribe from this list go to the following URL and read the
> > instructions: https://lists.samba.org/mailman/listinfo/samba
> To unsubscribe from this list go to the following URL and read the
> instructions: https://lists.samba.org/mailman/listinfo/samba
More information about the samba