[Samba] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal

Pappas, Bill Bill.Pappas at STJUDE.ORG
Mon Jul 24 20:58:39 GMT 2006


Thanks for the reply.

I will keep this in mind the next time this happens (your strace suggestion).

I upgraded from 3.0.21c to 3.0.23a as there were several bug fixes that could be tied to this locking issue with vxfs, though the release notes do not discuss much on file locking.  The issue will probably return even with the upgrade, but it does address some other issues I've been having in 3.0.21c.

I guess now I am in wait and see mode.

I will ask Veritas about something similar to the GPFS locking parameter you mentioned.  

Thanks,
Bill Pappas - System Integration Engineer - SAN 
St. Jude Children's Research Hospital
332 North Lauderdale
Memphis, TN 38105
Danny Thomas Tower - Room D1010
Mail Stop 312

-----Original Message-----
From: Hansjörg Maurer [mailto:hansjoerg.maurer at dlr.de] 
Sent: Monday, July 24, 2006 3:04 PM
To: Pappas, Bill
Cc: samba at lists.samba.org
Subject: Re: [Samba] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal

Hi

we had an comparable issue with gpfs clusterfilesystem from IBM at 
11/2005 I posted on samba technical (subject tdb_lock problem on gpfs 
filesystem). Smbd went to D state sometimes to in this case.
Mostly  we recognized the problem with the tdb files of the printer ( 
the samba server was acting as a printserver to)

I got the following information from the IBM gpfs list:
"Also, Samba uses fcntl locking extensively on these files and may be 
maintaining thousands of individual locks. GPFS specifically sets a 
limit on the number of fcntl ranges allowed on a file at one time (to 
prevent a runaway or deviant application from consuming large amounts of 
resources recording such locks). I expect you are exceeding this limit, 
but you can configure a larger value: "mmchconfig 
maxFcntlRangesPerFile=10000.
The default is 200 and the acceptable range is currently 10-200000"

Increasing this (undocumented) value to 10000 solves the problem in our 
case.

Maybe there is a similar restriction with vertiasFS.

Have you tried to start smbd with an

strace -e fcntl -f smbd


to trace down the system call?
In our case it shows something like

fcntl(18, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=936, len=1}) =
-1 ENOLCK (No locks available)

which indicates a problem with the filesystem.

Greetings

Hansjörg











Pappas, Bill wrote:

>Jeremy,
>
>I was in a position (last night) to upgrade to 3.0.23a. 
>Again, I was using 3.0.21c.
>
>If smbd goes into the D state, we can at least eliminate the possibility
>that it is an unexpected 3.0.21c bug.   
>
>
>Thanks,
>Bill Pappas - System Integration Engineer - SAN 
>St. Jude Children's Research Hospital
>332 North Lauderdale
>Memphis, TN 38105
>Danny Thomas Tower - Room D1010
>Mail Stop 312
>
>-----Original Message-----
>From: Pappas, Bill 
>Sent: Saturday, July 22, 2006 4:01 PM
>To: Jeremy Allison
>Cc: samba at lists.samba.org
>Subject: RE: [Samba] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal
>
>Jeremy Allison wrote:
>  
>
>>>Then it might be an intermittent bug in Veritas. What system call is
>>>smbd hanging on ? smbd should never hang in the D wait state unless
>>>it's a filesystem bug.
>>>      
>>>
>
>I am beginning to believe that this could make sense. Let me emphasize
>that ./private/secrets.tdb is shared between two samba servers (via
>clustered vxfs) that are running independently.  Only one server runs
>nmbd at a time as veritas cluster server fails nmbd over between servers
>as needed.  I just figured keeping smbd running up on both servers to
>reduce failover time.  I discovered that I had to share secrets.tdb to
>ensure that either samba server would remain as a domain member server.
>Is there another way to do what I am doing?  I'd gladly stop sharing
>this file if I could keep smbd up on both servers.  Does smbd need a
>lock on secrets.tdb? I thought (probably wrong) that only nmbd relied on
>this file?
>
>Further below, you will find some more logs between clients and the
>server running nmbd and smbd (as the other was sitting idle with smbd
>running). SJMEMDC05 is a windows domain controller and the other clients
>are windows explorer clients. 
>
>When you see these logs, they appear to confirm that secrets.tcb is
>directly involved, but how would a locking issue with this file cause
>smbd to go to the D state (and stay)?
>
>log.hc-dfinkletest:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
>tdb_lock failed on list 78 ltype=1 (Interrupted system call)
>log.hc-dfinkletest:  tdb_chainlock_with_timeout_internal: alarm (10)
>timed out for key SJMEMDC05 in tdb
>/usr/local/samba-3.0.21c/private/secrets
>.tdb
>log.hc-dfinkletest:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
>tdb_lock failed on list 78 ltype=1 (Interrupted system call)
>log.hc-dfinkletest:  tdb_chainlock_with_timeout_internal: alarm (10)
>timed out for key SJMEMDC05 in tdb
>/usr/local/samba-3.0.21c/private/secrets
>.tdb
>log.hc-dfinkletest:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
>tdb_lock failed on list 78 ltype=1 (Interrupted system call)
>log.hc-dfinkletest:  tdb_chainlock_with_timeout_internal: alarm (10)
>timed out for key SJMEMDC05 in tdb
>/usr/local/samba-3.0.21c/private/secrets
>.tdb
>log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
>tdb_lock failed on list 78 ltype=1 (Interrupted system call)
>log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
>out for key SJMEMDC05 in tdb
>/usr/local/samba-3.0.21c/private/secrets.tdb
>log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
>tdb_lock failed on list 78 ltype=1 (Interrupted system call)
>log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
>out for key SJMEMDC05 in tdb
>/usr/local/samba-3.0.21c/private/secrets.tdb
>log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
>tdb_lock failed on list 78 ltype=1 (Interrupted system call)
>log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
>out for key SJMEMDC05 in tdb
>/usr/local/samba-3.0.21c/private/secrets.tdb
>log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
>tdb_lock failed on list 78 ltype=1 (Interrupted system call)
>log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
>out for key SJMEMDC05 in tdb
>/usr/local/samba-3.0.21c/private/secrets.tdb
>log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
>tdb_lock failed on list 78 ltype=1 (Interrupted system call)
>log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
>out for key SJMEMDC05 in tdb
>/usr/local/samba-3.0.21c/private/secrets.tdb
>log.hc-mwang1:  tdb(/usr/local/samba-3.0.21c/private/secrets.tdb):
>tdb_lock failed on list 78 ltype=1 (Interrupted system call)
>log.hc-mwang1:  tdb_chainlock_with_timeout_internal: alarm (10) timed
>out for key SJMEMDC05 in tdb
>/usr/local/samba-3.0.21c/private/secrets.tdb
>
>Thanks,
>Bill Pappas - System Integration Engineer - SAN 
>St. Jude Children's Research Hospital
>332 North Lauderdale
>Memphis, TN 38105
>Danny Thomas Tower - Room D1010
>Mail Stop 312
>
>-----Original Message-----
>From: Jeremy Allison [mailto:jra at samba.org] 
>Sent: Saturday, July 22, 2006 10:56 AM
>To: Pappas, Bill
>Cc: jra at samba.org; samba at lists.samba.org
>Subject: Re: [Samba] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal
>
>On Fri, Jul 21, 2006 at 06:17:09PM -0500, Pappas, Bill wrote:
>  
>
>>I will say this works for weeks on end w/o a problem.  When you say
>>    
>>
>this will not work, why? I've had no real problems with the veritas
>clustered fs.  It adheres to file locking and fcntl operations like any
>normal local filesystem (ext3).
>
>Then it might be an intermittent bug in Veritas. What system call is
>smbd hanging on ? smbd should never hang in the D wait state unless
>it's a filesystem bug.
>
>Jeremy.
>
>
>  
>





More information about the samba mailing list