CTDB IP takeover/failover tunables - do you use them?

Thu Nov 10 03:01:56 UTC 2016

I'm currently hacking on CTDB's IP takeover/failover code.  For Samba
4.6, I would like to rationalise the IP takeover-related tunable
parameters.

I would like to know if there are any users who set the values of these
tunables to non-default values.  The tunables in question are:

   DisableIPFailover
       Default: 0

       When set to non-zero, ctdb will not perform failover or failback. Even
       if a node fails while holding public IPs, ctdb will not recover the IPs
       or assign them to another node.

       When this tunable is enabled, ctdb will no longer attempt to recover
       the cluster by failing IP addresses over to other nodes. This leads to
       a service outage until the administrator has manually performed IP
       failover to replacement nodes using the 'ctdb moveip' command.

   NoIPFailback
       Default: 0

       When set to 1, ctdb will not perform failback of IP addresses when a
       node becomes healthy. When a node becomes UNHEALTHY, ctdb WILL perform
       failover of public IP addresses, but when the node becomes HEALTHY
       again, ctdb will not fail the addresses back.

       Use with caution! Normally when a node becomes available to the cluster
       ctdb will try to reassign public IP addresses onto the new node as a
       way to distribute the workload evenly across the clusternode. Ctdb
       tries to make sure that all running nodes have approximately the same
       number of public addresses it hosts.

       When you enable this tunable, ctdb will no longer attempt to rebalance
       the cluster by failing IP addresses back to the new nodes. An
       unbalanced cluster will therefore remain unbalanced until there is
       manual intervention from the administrator. When this parameter is set,
       you can manually fail public IP addresses over to the new node(s) using
       the 'ctdb moveip' command.

   NoIPHostOnAllDisabled
       Default: 0

       If no nodes are HEALTHY then by default ctdb will happily host public
       IPs on disabled (unhealthy or administratively disabled) nodes. This
       can cause problems, for example if the underlying cluster filesystem is
       not mounted. When set to 1 on a node and that node is disabled, any IPs
       hosted by this node will be released and the node will not takeover any
       IPs until it is no longer disabled.

   NoIPTakeover
       Default: 0

       When set to 1, ctdb will not allow IP addresses to be failed over onto
       this node. Any IP addresses that the node currently hosts will remain
       on the node but no new IP addresses can be failed over to the node.

In particular, I would like to know if anyone has a use case where they
set any of these variables to different values on different nodes.  This
only really matters for the last 2 (NoIPHostOnAllDisabled,
NoIPTakeover), since the value on the recovery master is just used for
the other 2.  If you do this, can you please explain why?  :-)

I would like to make all of the above tunables global but I will
not do that if it breaks an existing use case and I can't find a
different way.

There are also 2 tunables to choose the algorithm used to calculate the
IP address layout:

   DeterministicIPs
       Default: 0

       When set to 1, ctdb will try to keep public IP addresses locked to
       specific nodes as far as possible. This makes it easier for debugging
       since you can know that as long as all nodes are healthy public IP X
       will always be hosted by node Y.

       The cost of using deterministic IP address assignment is that it
       disables part of the logic where ctdb tries to reduce the number of
       public IP assignment changes in the cluster. This tunable may increase
       the number of IP failover/failbacks that are performed on the cluster
       by a small margin.

   LCP2PublicIPs
       Default: 1

       When set to 1, ctdb uses the LCP2 ip allocation algorithm.

I plan to replace these with a single tunable to select the algorithm
(0 = deterministic, 1 = non-deterministic, 2 = LCP2 (default)).

Thanks for any feedback...

peace & happiness,
martin