[Samba] CTDB question about "shared file system"

Martin Schwenke martin at meltin.net
Wed Aug 5 22:53:21 UTC 2020


Hi Bob,

On Wed, 5 Aug 2020 17:10:11 -0400, Robert Buck via samba
<samba at lists.samba.org> wrote:

> Could I impose upon someone to provide some guidance? Some hint? Thank you

Any time!  :-)

> Is a shared file system actually required? If etcd is used to manage the
> global recovery lock, is there any need at that point for a shared file
> system?
> 
> In other words, are there samba or CTDB files (state) that must be on a
> shared file system, or can each clustered host simply have these files
> locally?
> 
> What must be shared? What can be optionally shared?

The only thing that CTDB uses the shared filesystem for is the recovery
lock, so if you're using etcd for the recovery lock then CTDB will not
be using the shared filesystem.

Clustered Samba (smbd in this case) expects to serve files to client
from a shared filesystem.  Although some of the metadata is stored in
in CTDB, smbd makes some assumptions about the underlying filesystem
(e.g. I/O coherence is required when using POSIX locking).

> The doc is not clear on this.

I have updated the wiki to mention this:

  https://wiki.samba.org/index.php/Setting_up_a_cluster_filesystem#Checking_lock_coherence

The page about ping_pong was already there but it doesn't look like
there was a link to it.

I also need to update the ctdb(7) manual page to point to the wiki.

> In our scenario, when we attempt to start up a second node, it always goes
> into a banned state. If we shut down the healthy node and restart CTDB on
> the "failed node" it now works. We're trying to understand this.

One reason I can think of for this is the recovery lock check during
recovery.  When recovery completes and CTDB is setting the recovery
mode back to "normal" on each node, it does a sanity check where it
attempts to take the recovery lock.  It should never be able to do this
because the lock should already be held by another process on the
master/leader node.

I've documented a couple of reasons, unrelated to the recovery lock,
why CTDB can behave badly:

  https://wiki.samba.org/index.php/Basic_CTDB_configuration#Troubleshooting

So, 2 questions:

* Does the 2nd node still get banned if you disable the recovery lock?

  If not then the problem is clearly with the recovery lock.

* What do the logs say about the reason for banning the node?

peace & happiness,
martin



More information about the samba mailing list