[Samba] CTDB: About the setting of take cluster lock timeout time

Mon Jul 24 00:11:56 UTC 2023

Hi,

On Sun, 23 Jul 2023 09:58:23 +0800, "tu.qiuping" <tu.qiuping at qq.com>
wrote:

> My ctdb version is 4.17.7
> 
> 
> Hello everyone.
> My ctdb cluster setup is as follows:
> [cluster]
> recovery lock = !/usr/libexec/ctdb/ctdb_mutex_ceph_rados_helper ceph client.admin meta_pool ctdb_reclock 4
> node address = 100.100.40.131
> 
> 
> [legacy]
> realtime scheduling = false
> 
> 
> [logging]
> log level = NOTICE

> When I power off a node in the cluster, the acquisition lock timed
> out, and the log of one of the nodes is as follows:

> 2023-07-21T10:02:46.753424+08:00 host-192-168-134-22 ctdbd[6751]: abcd:909::a8f8:8989:eeff:cd22:4379: node abcd:909::a8f8:8989:eeff:cd21:4379 is dead: 0 connected
> 2023-07-21T10:02:46.753519+08:00 host-192-168-134-22 ctdb-recoverd[8338]: ctdb_control error: 'node is disconnected'
> 2023-07-21T10:02:46.753598+08:00 host-192-168-134-22 ctdb-recoverd[8338]: ctdb_control error: 'node is disconnected'
> 2023-07-21T10:02:46.753619+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Async operation failed with ret=-1 res=-1 opcode=80
> 2023-07-21T10:02:46.753637+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Async wait failed - fail_count=1
> 2023-07-21T10:02:46.753652+08:00 host-192-168-134-22 ctdb-recoverd[8338]: ../../ctdb/server/ctdb_client.c:1558 Failed to read node capabilities.
> 2023-07-21T10:02:46.753667+08:00 host-192-168-134-22 ctdb-recoverd[8338]: ../../ctdb/server/ctdb_recoverd.c:456 Failed to get node capabilities
> 2023-07-21T10:02:46.753683+08:00 host-192-168-134-22 ctdb-recoverd[8338]: ../../ctdb/server/ctdb_recoverd.c:2640 Unable to update node capabilities.
> 2023-07-21T10:02:49.400062+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Leader broadcast timeout
> 2023-07-21T10:02:49.400163+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Start election
> 2023-07-21T10:02:49.400282+08:00 host-192-168-134-22 ctdbd[6751]: Recovery mode set to ACTIVE
> 2023-07-21T10:02:49.400400+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Attempting to take cluster lock (!/usr/libexec/ctdb/ctdb_mutex_ceph_rados_helper ceph client.admin pool ctdb_reclock 4)
> 2023-07-21T10:02:54.401302+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Leader broadcast timeout
> 2023-07-21T10:02:54.401418+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Start election

> [...]

> 2023-07-21T10:04:44.423467+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Leader broadcast timeout
> 2023-07-21T10:04:44.423531+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Start election
> 2023-07-21T10:04:48.757408+08:00 host-192-168-134-22 ctdbd[6751]: Recovery daemon ping timeout. Count : 1
> 2023-07-21T10:04:49.400713+08:00 host-192-168-134-22 ctdbd[6751]: ../../ctdb/server/ctdb_recover.c:609 Been in recovery mode for too long. Dropping all IPS
> 2023-07-21T10:04:49.401346+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Unable to take cluster lock - timeout
> 2023-07-21T10:04:49.423974+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Leader broadcast timeout
> 2023-07-21T10:04:49.424021+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Start election
> 2023-07-21T10:04:49.424120+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Attempting to take cluster lock (!/usr/libexec/ctdb/ctdb_mutex_ceph_rados_helper_rados_helper ceph client.admin pool ctdb_reclock 4)
> 2023-07-21T10:04:49.647722+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Cluster lock taken successfully
> 2023-07-21T10:04:49.647814+08:00 host-192-168-134-22 ctdb-recoverd[8338]: Took cluster lock, leader=1

> I think the timeout period of 120s is too long. If the ceph cluster
> recovers quickly, ctdb will still have to wait for the 120s timeout
> before acquiring the lock again, which will have a relatively large
> impact on the business.

> So, if I set the timeout to 30s, will it have any effect and waste
> system resources? Or can I reduce this timeout by modifying vars in
> ctdb.tunables.? I look at the source code, this setting should be set
> in h = ctdb_cluster_mutex(s, ctdb, ctdb->recovery_lock, 120,
> take_cluster_lock_handler, s, lost_cluster_lock_handler, rec).

This could be true and might be worth trying.  However, the 120s
timeout there is really meant to be a failsafe for when the lock
technology (e.g. cluster filesystem, object store) is behaving badly.
That is, the timeout should only occur if the helper gets stuck... and
that should generally not happen.

Normally, if the lock can't be taken due to contention, you would
expect the helper to immediately return with '1' to indicate contention.

We have made quite a few changes to
ctdb/server/ctdb_mutex_fcntl_helper.c to try to handle edge cases
better.  It might be worth look at
ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c and checking if it can
also be improved.

Unfortunately, I am not familiar with Ceph RADOS, so I can't make any
recommendations.

peace & happiness,
martin