RAFT and CTDB

Volker Lendecke Volker.Lendecke at SerNet.DE
Wed Nov 19 14:29:10 MST 2014


On Wed, Nov 19, 2014 at 09:45:48AM -0800, Richard Sharpe wrote:
> > It's not finished yet, sorry. I have the basic algorithm and
> > configuration changes done, but log compaction is still
> > missing, so this is nothing for general consumption yet.
> >
> > Apart from that, I want to have a dbwrap_raft eventually,
> > the main goal is to meet the persistence requirements that resilient
> > and persistent file handles need.
> 
> So, you guys keep saying that but never let on what the issue with
> CTDB as it stands is :-( Is there some sort of secret handshake
> required?
> 
> It seems to me the problem is that it is a result of a design decision
> taken by CTDB where some types of TDBs are fetch on demand ...
> however, if a Samba node has just recorded info that must be
> persistent and no one fetches it before that node crashes then we have
> just lost that persistent info.

Correct. The basic design decision of ctdb was an insight Tridge and
I had years ago: We CAN lose data in ctdb. The main goal was to make
locking.tdb fast. locking.tdb contains entries for all open files. If
you transfer locking.tdb data ownership to a node that holds the file
open, it does not have to tell everybody else proactively. It does not
even have to replicate the open file information for failover purposes,
because the information about open files on a node is worthless anyway
if the node crashes. All that ctdb has to make sure is that records are
transferred on demand when someone else actively asks for it. There's
some really deep subleties around this like for example getting rid
of deleted records in a correct and reasonably scalable manner without
them getting back as zombies or multiple nodes chasing the same record,
but ctdb as a pure on-demand data mover is basically it. This breaks down
of course if you want to hand out persistence guarantees for open files,
so we have to find ways somewhere between the nonpersistent (cheap/fast
writes/no guarantees) and persistent (expensive writes, all nodes always
have the same copy on rotating rust) databases.

Volker

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de


More information about the samba-technical mailing list