[Samba] CTDB RecLockLatencyMs vs RecoverInterval
robert.buck at som.com
Wed Jul 1 02:20:14 UTC 2020
Thank you, Martin.
Yes, we happen to be using Samba and CTDB v4.10.7, on Ubuntu. *Would these
happen to include the defect?* *In your opinion, will 4s be an issue?* We
happen to be running this on top of a geo-distributed etcd cluster, and in
this particular case there was about 4200 miles between the two data
centers. We're running a distributed NFS file system over a total of three
data centers, spanning 7000+ miles. During failover testing we're seeing
failover times less than 7 seconds, which seems pretty nice to me. *In
your experience, anything we should be tuning for? *
The file system performs great, we're just trying to tune/understand
winbind and trying to get that to work flawlessly.
On Tue, Jun 30, 2020 at 6:27 PM Martin Schwenke <martin at meltin.net> wrote:
> Hi Bob,
> On Tue, 30 Jun 2020 17:00:11 -0400, Robert Buck via samba
> <samba at lists.samba.org> wrote:
> > I have a question regarding CTDB RecLockLatencyMs tunable parameter. Is
> > there any relationship between the RecLockLatencyMs property and
> > the RecoverInterval property? Does one need to be larger than the other?
> > if RecLockLatencyMs were increased to 5000ms, should some other setting
> > changed in proportion?
> > We're using a geo-distributed etcd cluster for the CTDB recovery lock
> and I
> > noticed a "*High RECLOCK latency"* (of 4s) message in syslog, and just
> > wanted to see if we could safely squelch the warning, and if so, how?
> RecoverInterval indicates how often nodes should monitor conditions
> that indicate that a database recovery is needed. I would suggest
> leaving this at the default of 1 second. In future we might change
> this to be hard coded anyway.
> Many years ago CTDB used to release the recovery lock after each
> recovery. This meant that the recovery lock had to be taken before
> each recovery, so the recovery lock latency mattered more.
> We changed that so the recovery lock is taken before the first recovery
> after a node is elected leader (currently called recovery master), so
> it is now more of a cluster lock. We also made some changes so that
> the leader is more likely to be stable across elections. Both of these
> changes make the recovery lock latency matter a lot less.
> So, I don't think that warnings about recovery lock latency are as
> important as they used to be. You could safely increase
> RecLockLatencyMs to 5000.
> However... (and there is always a "however" ;-)
> The presence of recovery lock latency warnings made one of the race
> conditions in the following bug pretty obvious to me:
> so, while they matter less, they still have value.
> If you're using a CTDB recovery lock with high latency then you should
> make sure you are using a version that contains a fix for the above bug.
> Please let us know if you have more questions...
> peace & happiness,
SENIOR PLATFORM SOFTWARE ENGINEER
SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T (212) 298-9624
ROBERT.BUCK at SOM.COM
More information about the samba