[Samba] CTDB question about "shared file system"

Sat Aug 8 23:07:07 UTC 2020

Hi Bob,

On Sat, 8 Aug 2020 07:07:58 -0400, Robert Buck <robert.buck at som.com>
wrote:

> On Sat, Aug 8, 2020 at 2:52 AM Martin Schwenke <martin at meltin.net> wrote:

> > On Thu, 6 Aug 2020 06:55:31 -0400, Robert Buck <robert.buck at som.com>

> > > And after these changes the logs simply have these messages periodically:
> > >
> > > Disabling takeover runs for 60 seconds
> > > Reenabling takeover runs
> > >
> > > *Is this normal?*  
> >
> > How frequently are these messages logged?  They should occur as nodes
> > join but should stop after that.  If they continue are there any clues
> > indicating why takeover runs occurs?  A takeover run is just what CTDB
> > currently calls a recalculation of the floating IP addresses for
> > fail-over.

> Yes, those log messages, they were occurring once per second (precisely).

In that case I would expect the logs on another node to (just as
regularly) say:

  Takeover run starting

and before that to indicate a reason.

Anything on another node?

> Then after several hours they stopped after these messages in the log:
> 
> ctdbd[1220]: 10.206.2.124:4379: node 10.200.1.230:4379 is dead: 0 connected
> ctdbd[1220]: Tearing down connection to dead node :0
> ctdb-recoverd[1236]: Current recmaster node 0 does not have CAP_RECMASTER,
> but we (node 1) have - force an election
> ctdbd[1220]: Recovery mode set to ACTIVE
> ctdbd[1220]: This node (1) is now the recovery master

> [...]

> Then it's a clean syslog after that.

That makes some amount of sense.  It looks like the master/leader node
changed, so the old one stopped attempting a takeover run each second.

Thanks...

peace & happiness,
martin