CTDB: Proposed CTDB_CONTROL_PING and "ctdb ping" changes

Martin Schwenke martin at meltin.net
Wed Feb 6 01:30:38 MST 2013


We've seen a couple of CTDB issues that we would like to resolve:

* The initscript is run to start ctdbd, it succeeds and prints "OK',
  and the setup event fails so ctdbd dies.  This all happens within a
  fraction of a second and it seems like the initscript should somehow
  be better behaved by reporting ctdbd's demise.

* There are some unlikely synchronisation problems in the startup
  sequence.  In particular, given that the setup event is
  asynchronous, it is theoretically possible for the recovery daemon
  to be forked and initiate the first recovery before the setup event
  has completed.  It would be nice to enforce clean synchronisation of
  the startup sequence.

The following series of patches does all of this and a bit more.  :-)

The only complications are:

* It changes the return result of CTDB_CONTROL_PING to be the current
  "runstate" (see the 1st patch) rather than the number of clients
  (which is otherwise reported in CTDB_CONTROL_STATISTICS).  This is a
  subtle API change.  However, Samba doesn't use CTDB_CONTROL_PING and
  I seriously doubt anyone else does.

* As a consequence of the above, the output of "ctdb ping" is changed
  to print the runstate instead of the number of clients.  I think
  this is a benign change: there is no machine readable version of the
  CTDB ping output and most uses that I've seen (and implemented) just
  redirect the output to /dev/null.

We could clearly have added another control to retrieve the runstate
and we could have left ping alone.  I'd prefer not to do this, simply
because I don't think there's any backward compatibility to maintain
(i.e. I don't think anything depends on the current behaviour) and we
don't need more controls just for the sake of it.  It's a fairly
arbitrary decision but I'd like the resulting code and API after the
patches to be simpler rather than more complicated - 1 less control!

Here's a summary of the patches...

  commit c785b63fc88f4896adf299d82f812d3527366c65
  Author: Martin Schwenke <martin at meltin.net>

    ctdbd: Replace ctdb->done_startup with ctdb->runstate

  commit 33614fd461f686584196f58b650d92f2e3aad5a5
  Author: Martin Schwenke <martin at meltin.net>

    ctdbd: Only start recovery daemon and timed events after setup event

  commit c97b7d92212f0a6f8f0b935b810c1ea62a5d51ac
  Author: Martin Schwenke <martin at meltin.net>

    ctdbd: Start logging process earlier

  commit f9f02f94fbaf6a1be31e7e98c333590b1e457612
  Author: Martin Schwenke <martin at meltin.net>

    ctdbd: CTDB_CONTROL_PING returns runstate, not number of clients

  commit f79559c18b9a7d53e21f7e629a1c52f6fb5b283f
  Author: Martin Schwenke <martin at meltin.net>

    tools/ctdb: "ctdb ping" now accepts optional expected run state arguments

These are extra but, perhaps, related...

  commit c9fb2c3b86cb4f00c2786c35bdfa15b79ec7daf6
  Author: Martin Schwenke <martin at meltin.net>

    eventscripts: 11.natgw should not call ctdb tool in "init" event

  commit d0ddd8a98cbe4e44962ce8dccc84f857405a0e7d
  Author: Martin Schwenke <martin at meltin.net>

    ctdbd: When the "setup" event fails log an error and exit, don't abort

This is all in my master-runstate branch at:

  git://git.ozlabs.org/~martins/ctdb.git

or

  http://git.ozlabs.org/?p=martins-ctdb.git;a=shortlog;h=refs/heads/master-runstate

Is anyone incredibly unhappy with the proposed CTDB_CONTROL_PING and
"ctdb ping" changes?

Thanks...

peace & happiness,
martin


More information about the samba-technical mailing list