[Samba] CTDB RecLockLatencyMs vs RecoverInterval

Wed Jul 1 02:20:14 UTC 2020

Thank you, Martin.

Yes, we happen to be using Samba and CTDB v4.10.7, on Ubuntu. *Would these
happen to include the defect?*  *In your opinion, will 4s be an issue?* We
happen to be running this on top of a geo-distributed etcd cluster, and in
this particular case there was about 4200 miles between the two data
centers. We're running a distributed NFS file system over a total of three
data centers, spanning 7000+ miles. During failover testing we're seeing
failover times less than 7 seconds, which seems pretty nice to me.  *In
your experience, anything we should be tuning for? *

The file system performs great, we're just trying to tune/understand
winbind and trying to get that to work flawlessly.

Bob

On Tue, Jun 30, 2020 at 6:27 PM Martin Schwenke <martin at meltin.net> wrote:

> Hi Bob,
>
> On Tue, 30 Jun 2020 17:00:11 -0400, Robert Buck via samba
> <samba at lists.samba.org> wrote:
>
> > I have a question regarding CTDB RecLockLatencyMs tunable parameter. Is
> > there any relationship between the RecLockLatencyMs property and
> > the RecoverInterval property? Does one need to be larger than the other?
> Or
> > if RecLockLatencyMs were increased to 5000ms, should some other setting
> be
> > changed in proportion?
> >
> > We're using a geo-distributed etcd cluster for the CTDB recovery lock
> and I
> > noticed a "*High RECLOCK latency"* (of 4s) message in syslog, and just
> > wanted to see if we could safely squelch the warning, and if so, how?
>
> RecoverInterval indicates how often nodes should monitor conditions
> that indicate that a database recovery is needed.  I would suggest
> leaving this at the default of 1 second.  In future we might change
> this to be hard coded anyway.
>
> Many years ago CTDB used to release the recovery lock after each
> recovery.  This meant that the recovery lock had to be taken before
> each recovery, so the recovery lock latency mattered more.
>
> We changed that so the recovery lock is taken before the first recovery
> after a node is elected leader (currently called recovery master), so
> it is now more of a cluster lock.  We also made some changes so that
> the leader is more likely to be stable across elections.  Both of these
> changes make the recovery lock latency matter a lot less.
>
> So, I don't think that warnings about recovery lock latency are as
> important as they used to be.  You could safely increase
> RecLockLatencyMs to 5000.
>
> However... (and there is always a "however" ;-)
>
> The presence of recovery lock latency warnings made one of the race
> conditions in the following bug pretty obvious to me:
>
>   https://bugzilla.samba.org/show_bug.cgi?id=14294
>
> so, while they matter less, they still have value.
>
> If you're using a CTDB recovery lock with high latency then you should
> make sure you are using a version that contains a fix for the above bug.
>
> Please let us know if you have more questions...
>
> peace & happiness,
> martin
>
>

-- 

BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER

SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T  (212) 298-9624
ROBERT.BUCK at SOM.COM