CTDB internals

Christopher R. Hertel crh at ubiqx.mn.org
Fri Nov 2 00:31:17 MDT 2012


Thanks, Ronnie.

Further question:  Byte-range-locking of files in the shared filesystem.

Do I understand correctly that per-file byte-range locks are managed via
brlock.tdb?

That is, if a Windows client opens a file and sets a byte-range lock, that
lock is managed by the Samba server by using brlock.tdb instead of fcntl locks.

I'm pretty sure I read that right in Michael's presentation slides, but I
want to be sure.

Thanks.

Chris -)-----

On 11/02/2012 12:32 AM, ronnie sahlberg wrote:
> On Thu, Nov 1, 2012 at 10:23 PM, Christopher R. Hertel <crh at ubiqx.mn.org> wrote:
>> On 11/02/2012 12:00 AM, Amitay Isaacs wrote:
>>> Hi Chris,
>>>
>>> On Fri, Nov 2, 2012 at 3:28 PM, Christopher R. Hertel <crh at ubiqx.mn.org> wrote:
>>>> Amitay, Obnox, et. al.,
>>>>
>>>> I just want to make sure that I've got this right...
>>>>
>>>> Reviewing Michael's tutoral, given in 2009 at SambaXP, here's what I get:
>>>>
>>>> * The underlying tables are all TDB tables.
>>>>
>>>> * These TDB tables are of three types:
>>>>   1) Persistent
>>>>   2) Normal ("volatile")
>>>>   3) Recovery
>>>
>>> There are only two types of databases persistent and normal. Recovery
>>> file is just a regular file and not tdb database.
>>
>> Ah...
>>
>> ...but access is still arbitrated using fcntl byte-range locks.  Is that
>> correct?
>>
>>>> I think I generally understand how these work.  I have some questions about
>>>> the sequence of events when writing to a Persistent TDB, but those can wait.
>>>>
>>>> My immediate questions are:
>>>>
>>>> Q: Is the CTDB_RECOVERY_LOCK file the only tdb file that will be stored on
>>>>    shared disk and concurrently accessed by multiple nodes?
>>>
>>> Yes, CTDB_RECOVERY_LOCK file is the only file that is stored on the
>>> shared storage for concurrent access to resolve split-brain situations
>>> and doing recoveries.
>>
>> Cool.
>>
>> In our test case, we have a couple of other files in there.  For example,
>> /etc/sysconfig/ctdb is symlinked to a shared file so that we only have to
>> edit the file once.
>>
>>>> Q: For the other two types (Persistent and Normal), is the ctdbd daemon
>>>>    the only reader/writer to the local TDBs?  For Normal LTDBs in
>>>>    particular, is fcntl byte-range locking used to manage access in any
>>>>    way?
>>>
>>> For non-persistent databases smbd and ctdbd can read/write to local
>>> TDBs. The access is ordered by fcntl byte-range locks. smbd accesses a
>>> record from local TDBs only when the local CTDB node is data master
>>> for that record.
>>
>> Q: To do that, smbd would have to go through CTDB somehow, because only
>>    the ctdbd would know if it were master.  Is that correct?
> 
> No,  smbd has internal knowledge about the ctdb header for the record,
> so smbd can decide "is this record local  or not".
> If it is local then smbd just reads/writes to the record without any
> ctdb involvement. Just like standalone samba does.
> 
> Only if smbd discovers the record is not local will it involve ctdbd
> and request that the record is fetched across the cluster.
> 
>>
>>> For persistent databases, CTDB transaction API is used to write data to TDBs.
>>
>> I have questions on how that works but they can wait.
>>
>> Thanks!
>>
>> Chris -)-----
>> --
>> "Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
>> Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
>> jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
>> ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
>> OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org

-- 
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org


More information about the samba-technical mailing list