How to interface to Zookeeper, etcd or Corosync ...

Mon Apr 4 07:21:15 UTC 2016

On Sun, Apr 03, 2016 at 07:51:02PM -0700, Richard Sharpe wrote:
> I wanted to engage in some discussion around how to interface to those
> three clustering engines.
> 
> My experience is that while etcd could be accessed from each smbd,
> Zookeeper cannot be so easily accessed because its client places a
> connection and this does not work across so well.
> 
> I think one could keep the smb.conf in either of those clustered
> name-value stores but one could also keep some tdbs with some effort.

For ctdb-based clusters we recommend the registry configuration to
keep the smb.conf in sync. This is a well-established abstraction over
the key-value store.

> I suspect that I would need to use something like dbwrap as the
> interface to all of this, and I have used dbwrap before to interface
> to ZooKeeper, but it was done via a central ZooKeeper server on each
> node.

In dbwrap we distinguish pretty sharply between persistent and
non-persistent (indicated by CLEAR_IF_FIRST) tdbs. For the persistent
ones like secrets.tdb and registry.tdb I think it would be very well
possible to put them into something like etcd or zookeeper. However,
we depend on dbwrap level transactions to work here.

> That being the case, it would seem to be ideal to have a separate
> process, perhaps the master smbd that interfaces to each of these
> services and is accessed via Samba's messaging service.
> 
> It would seem to need to:
> 
> 1. Monitor loss of connection to the quorum and kill all child smbds
> if that occurs,

Right now every smbd holds a connection to ctdbd. If ctdb kills those
connections, all smbds exit. Maybe something like that would be
appropriate here.

> 2. Provide the ability to read and set records in the key-value store

That's what dbwrap does with its connection to ctdb. Of course it is
possible to write a different dbwrap backend to connect to another
key-value store. But we need to keep the multi-operation transactions
working.

> 3. Provide a lock and read or lock and write service, with or without waiting.

Right now I'm working on a dbwrap implementation that can live almost
without os-level locks. What it needs is a process monitoring service
to provide a fast retry mechanism when a lock holder dies. If we have
that, we can base Samba upon any key-value store like redis or so,
provided we can put some smarts into the operations. My immediate goal
is to base the local databases on lmdb, which does not provide
application level record locks.

> Does this sound reasonable.

Of course, this is always interesting to talk about. Keep in mind
though that the performance requirements are pretty high. We do
several database operations per open/close cycle. So this needs to be
a very low-latency mechanism if you want to apply it to the
nonpersistent databases like locking.tdb too. It is an interesting
development project to make our open/close codepaths asynchronous with
regards to the dbwrap operations to enable multiple simultaneous opens
and closes to wait for the dbwrap ops. This would help with an
increasing set of scenarios. Also, we need to parallelize our
directory listing code path that also has to do dbwrap operations to
look up the delete-on-close flag and the timestamps.

The persistent ones are a bit less criticial latency-wise, and we
could apply some caching there if we get change notifications.

> At the 10,650 foot level, I want something other than CTDB as the
> cluster-wide key-value store ...

What is so bad about ctdb that it is unacceptable in your environment?
If you can point at the deficiencies then we might either overcome
those or provide abstrations to connect to the better alternatives.

With best regards,

Volker Lendecke

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de