RAFT and CTDB

Richard Sharpe realrichardsharpe at gmail.com
Thu Nov 20 16:24:39 MST 2014


On Mon, Nov 17, 2014 at 2:31 PM, Michael Adam <obnox at samba.org> wrote:
> On 2014-11-17 at 13:20 -0800, Richard Sharpe wrote:
>> On Sun, Nov 16, 2014 at 11:41 PM, Volker Lendecke
>> <Volker.Lendecke at sernet.de> wrote:
>> > On Sat, Nov 15, 2014 at 10:31:30AM -0800, Richard Sharpe wrote:
>> >>
>> >> At SDC you mentioned that you have an implementation of RAFT and I
>> >> assumed, perhaps incorrectly, that you were thinking of using RAFT to
>> >> manage things like recovery in CTDB.
>> >>
>> >> Can you tell me more about your ideas in this regard and point me at any code?
>> >
>> > It's not finished yet, sorry. I have the basic algorithm and
>> > configuration changes done, but log compaction is still
>> > missing, so this is nothing for general consumption yet.
>> >
>> > Apart from that, I want to have a dbwrap_raft eventually,
>> > the main goal is to meet the persistence requirements that resilient
>> > and persistent file handles need.
>> >
>> > What would your project be?
>>
>> Well, I have to get CTDB working with a clustered file system that
>> does not currently support fcntl locks :-(
>
> You don't necessarily need to.
> CTDB only ues the fcntl lock for the recovery lock file
> which is its means for split brain prevention.
>
> It runs without a recovery lock file.
> But then you should try to implement split brain prevention for
> ctdb differently. I think we still need good hooks in ctdb for
> mechanisms other than the recovey lock.

Hmmm, so the essential abstraction here is that any node that is no
longer a member of the cluster (because it can't get a lock on that
file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
open the recovery lock file and then take out a lock on it.

The first should/will fail if we are no longer a member of the cluster
and the second will fail if the cluster properly supports fcntl locks
but another recovery daemon has already locked the file ...

To make this support other clustering approaches would probably
involve redesigning that somewhat.

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)


More information about the samba-technical mailing list