[Samba] hanging smbd(s) revisited

William Jojo jojowil at hvcc.edu
Tue Feb 28 21:33:16 GMT 2006


----- Original Message ----- 
From: "Jeremy Allison" <jra at samba.org>
To: "William Jojo" <jojowil at hvcc.edu>
Cc: <samba at lists.samba.org>; "Gerald (Jerry) Carter" <jerry at samba.org>;
"Andrew Tridgell" <tridge at samba.org>; "Jeremy Allison" <jra at samba.org>
Sent: Tuesday, February 28, 2006 3:25 PM
Subject: Re: [Samba] hanging smbd(s) revisited


> On Tue, Feb 28, 2006 at 01:30:40PM -0500, William Jojo wrote:
> >
> > So we've gone back to 3.0.20 and we're stable again. I should indicate
that
> > it's 3.0.20 with patches 9484, 9481 and 9456 to fix Win98 dir loop,
excel
> > shared workbook and ACLs (not necessarily in that order).
> >
> > Since the problem manifests in the filesystem where our Samba install
is,
> > and it appears to be a tdb (namely locking.tdb for fd=15, but can't
identify
> > the fd=3 that spins unmercifully), I'm wondering if *maybe* it could be
the
> > "Fix for tdb clear-if-first race condition." or some other tdb change
after
> > 3.0.20 that traded one bug for another? I'm guessing... :-)
>
> Identifying that fd would be really useful.

Ok, dug it up. This is the IBM info.


----- Original Message ----- 
From: Robert Elias
To: jojowil at hvcc.edu
Sent: Monday, February 27, 2006 12:30 PM
Subject: Pmr#47402,180


Bill,

Thank you for patience while I work through your questions. I ran this issue
by our level 3 performance team and received the following input.

The file in question is inode 12363 in /samba. Use 'find /samba -inum 12363'
to determine the file name.

I ran this by the Samba team members that work for IBM and they suggested
the following:

As a long shot, I suggest that you have him run tdbtorture (a file i/o
testcase) from the samba source tree as that does a simulation of the
locking that Samba does and if we have a bug in AIX locking.

Your comments or thoughts?

Thanks,

Robert Elias
AIX Duty Manager
IBM Integrated Technology Services
214-257-9292 - T/L 972






[storage:/samba/3.0.21b] # find /samba -inum 12363
/samba/3.0.21b/var/locks/locking.tdb



> > We are going to start moving to 20a, then 20b, then to 21 then back to
21a
> > where we started (21b did it too, haven't tried 21c yet) after another
day
> > or two of 3.0.20 to make sure we're not losing our mind.
>
> I've looked over the logic for the aquiring/release of the lock
> for the locking.tdb in the 3.0.21c release code - I can't see any possible
> paths, error or otherwise where the lock can be left live on a
> record. I'll keep looking though. When it's spinning, what is the errno
that the fcntl call
> returns ?
>

What appears to happen is pid 266946 is exiting (exited?) and some kind of
dealock has occured which shows the following in filemon.sum from the
perfpmr that IBM had me run during the event.


<snip>
9603204 hooks processed (incl. 2108 utility)
60.013 secs in measured interval
Cpu utilization:  42.9%

Most Active Files
------------------------------------------------------------------------
  #MBs  #opns   #rds   #wrs  file                     volume:inode
------------------------------------------------------------------------
 230.1      0  29492      0  pid=266946_fd=3
  43.3      0   1588    129  pid=240270_fd=5
</snip>


My question to IBM was how can this happen? The above inode number is what
was provided to me yesterday.

Since moving to 3.0.20 the problem has subsided, I'm back here and not
bugging IBM at the moment. :-|

Whatever else I can get you, just say the word. :-)

Do you agree with us to step to 20a, 20b ... ?


Cheers,

Bill


> Jeremy.
> -- 
> To unsubscribe from this list go to the following URL and read the
> instructions:  https://lists.samba.org/mailman/listinfo/samba
>



More information about the samba mailing list