[Samba] samba failover with ctdb and client-visible errors

Martin Schwenke martin at meltin.net
Sat May 4 02:05:21 UTC 2024

Hi Sage,

On Fri, 3 May 2024 16:17:45 -0500, Sage Weil via samba
<samba at lists.samba.org> wrote:

> I'm setting up a clustered Samba+CTDB in front of CephFS and am
> running into an issue during failover.  For the most part everything
> seems to work: the IP moves quickly, smbd is started on the right
> node, etc, but if there is an IO load from a client during failover
> (e.g., copying a big directory full of files in File Explorer), it
> pauses for a couple of seconds and then pops up an error dialog box.
> If I hit 'Try Again' everything continues without problems.
> However... I assume that a client-visible error like this will cause
> problems with most applications (that may not be persistent enough to
> retry everything).  I did a google search and the only thing I found
> was something suggesting passing a flag to xcopy that forces a retry
> on error.
> Here's what the dialog looks like when I reboot one of the gateway nodes:
>   https://i.ibb.co/kh4fFPW/tryagain.png
> If I click 'Try Again' everything proceeds.

Error handling seems to be application-dependent on Windows.  If you're
doing lots of copying then the hint you found for xcopy is probably a
good idea.  Many applications will silently reconnect.

One issue is that CTDB's failover is done at the TCP networking level,
so it is impossible to hide errors from applications.

The dream is to get transparent failover with Microsoft's Witness
Protocol (available in Samba ≥ 4.20) and persistent file handles (not
yet in Samba).

> Here's my smb.conf:
> root at smbgw2:/etc/samba# cat smb.conf
> [global]
>   clustering = yes
>   include = registry
> root at smbgw2:/etc/samba# net conf list
> [global]
> netbios name = smbgw
> clustering = yes
> idmap config * : backend = tdb2

For default domain ID mapping, you probably want autorid these days:


> [...]
> CTDB config looks like so:

> CTDB has a single IP in public_addresses that is moving around between
> the gateway nodes as expected--from what I can tell that is all
> working well.

If CephFS is sane (i.e. has proper locking coherency - others will be
able to make better comments about this) then clustered Samba can
happily be active-active, so you can multiple IPs in public_addresses,
so multiple clients can access via different gateway nodes in parallel.

> The only other issue I've identified is that I seem to have to create
> the user (and set the password with smbpasswd) on each of the
> gateways... even though I expected that the 'passdb backend = tdbsam'
> line would keep user and password info in ctdb somewhere.  Am I
> missing something there?

There currently isn't a way of exposing local users at the OS level,
and an OS user is needed for file permissions.  We have thought of
faking this via winbind, but it keeps sliding down the priority queue.

Setting up a Samba Active Directory server isn't especially difficult,
so tends to be a good option.

I hope some of that is useful...  :-)

peace & happiness,

More information about the samba mailing list