default ctdb configuration file locations

Sat Oct 29 03:36:35 UTC 2016

On Fri, 28 Oct 2016 10:56:22 -0500, Steve French <smfrench at gmail.com>
wrote:

> On Sat, Oct 22, 2016 at 3:39 PM, Martin Schwenke <martin at meltin.net> wrote:
> > On Fri, 21 Oct 2016 10:48:30 -0500, Steve French <smfrench at gmail.com>
> > wrote:
> >  
> >> [...] Part of the reason I was worrying
> >> about this is seeing some strange behavior I (at first) thought was
> >> configuration related but presumably is not.  I had configured a two
> >> node ctdb cluster with the node below's ctdbd.conf configured as:
> >>
> >> "CTDB_CAPABILITY_RECMASTER=no"
> >>
> >>
> >> $ ctdb status
> >>
> >> Number of nodes:2
> >> pnn:0 172.29.161.26    OK
> >> pnn:1 172.29.161.45    OK (THIS NODE)
> >> Generation:1678151314
> >> Size:2
> >>
> >> hash:0 lmaster:0
> >> hash:1 lmaster:1
> >> Recovery mode:NORMAL (0)
> >> Recovery master:1
> >>
> >> I am certain that CTDB_CAPABILITY_RECMASTER is set to no on that node
> >> (and not set to anything on the other node) but it is still becoming
> >> recovery master which I found confusing.  
> >
> > What does "ctdb getcapabilities" say on that node?
> >
> > If it says "YES":  
> 
> It says
>     RECMASTER: NO
>     LMASTER: YES
> 
>  on both nodes including what should be the master node, ie the one
> where CTDB_CAPABILITY_RECMASTER=no is commented out.  I expected it to
> say "RECMASTER:NO" only on the nodes where
> CTDB_CAPABILITY_RECMASTER=no
> is explicitly set
> 
> 
> > * Have you restarted CTDB since setting "CTDB_CAPABILITY_RECMASTER=no"?  
> 
> Yes
> 
> 
> > * CTDB_CAPABILITY_RECMASTER is case-sensitive
> >
> >   Only an exact value of "no" will switch it off.  Perhaps we need to
> >   loosen the matching?
> >
> > If it says "NO" and the node is recovery master (for more than about
> > a second) then we have a bug in the recovery daemon's recovery master
> > checking logic. I have touched that code several times in the last
> > couple of years so there is obviously a chance that I have broken it.
> > However, it does work for me...  :-)  
> 
> So two interesting things,
> 
> 1) with what it thinks is no recovery master capable node, it picked
> one anyway (the last one in this case).

Do the nodes agree?

# onnode -p all ctdb recmaster

Run it a few times and see if it flaps around.  If no node can be master
then every time an election ends a new one will probably be called.  So
the nodes will also be marked as being in recovery most of the time.

During an election, each node that can be master asserts itself as the
master to start with and then cedes the role when someone beats it in
the election.  That's something we might revisit.  We might be able to
have an "unknown master" value that is used during elections 

Elections are called according to code in
ctdb/server/ctdb_recoverd.c:validate_recovery_master().  I managed to
get that code all together and did some minor cleanups.  However, it
needs some amount of rework.

In particular, no node should ever call an election if it can't be
master.  That way it will never assert itself as master at the
beginning of an election and end up flapping around.  We just need
to make sure that if a node becomes inactive then it disqualifies itself
from being master. There's the corner case where a node stalls for some
reason and wakes up thinking it is still master after another node has
won an election.  Part of this is that the above code currently uses
CTDB_UNKNOWN_PNN to indicate an unknown recovery master at startup, but
we need to generalise this to mean unknown recovery master at any time.

The other minor issue is if a node joins the cluster but it can't be
master then it currently has no way of finding out who the master is.
The cleanest way of resolving this is to have interactions with the
master node depend on broadcasts instead of having every node being
able to provide that information. So, status commands like "ctdb status"
and "ctdb recmaster" would just broadcast to find out who is master.

In the slightly longer term we'll switch to a "master broadcasts
regularly to assert itself" model.  New nodes would wait for a
broadcast to find out who the master is.  If no broadcast comes then
call an election:

* If cluster lock (coming soon) is configured then first node that has
  capability and grabs the lock becomes master.  This works well for
  interacting with some other cluster manager - the call-out would only
  be able to get the lock on the master.

* If cluster lock is not configured then run current election algorithm.

Elections would be minimised.

It all just needs time...  :-)

> 2) defaults are not what we expected. setting
> CTDB_CAPABILITY_RECMASTER=yes explicitly and restarting ctdb did work.
> So the default (if that line is commented out) is "RECMASTER:NO" not
> "RECMASTER:YES" (at least in some cases, e.g. in our case we don't
> configure a recovery lock file).   This is Samba 4.5.1

I don't see this happen.

Sorry, I think I need to clarify the configuration file thing both here
and in documentation.  The distro-specific configuration file is always
loaded if it exists.  If ctdbd.conf exists then it is loaded last and
overrides. Sorry, I didn't explain that well.  :-(

So, if you have an old /etc/sysconfig/ctdb hanging around with old
config and then you have ctdbd.conf with commented out lines, then you
might not get what you expect.

peace & happiness,
martin