ctdb cluster not healthy:"Unable to take recovery lock - contention"
Martin Schwenke
martin at meltin.net
Sat May 8 11:46:57 UTC 2021
On Sat, 8 May 2021 17:44:43 +0800 (CST), 风无名 via samba-technical
<samba-technical at lists.samba.org> wrote:
> sorry that my attachments are too large.
> and my ctdb version is 4.8.5
> At 2021-05-08 16:34:57, "风无名" <wuming_81 at 163.com> wrote:
>
> hello, everyone.
> after I started my ctdb cluster many minutes , my cluter are still not healthy.
> the logs are in the attachment.
> my cluster consists of three nodes. /etc/hosts file:
> 192.168.200.10 node1
> 192.168.200.20 node2
> 192.168.200.30 node3
>
>
> public address config file:
> 192.168.210.10/24 ens15f1
> 192.168.210.30/24 ens15f1
> 192.168.210.20/24 ens15f1
>
>
> nodes config file:
> 192.168.200.10
> 192.168.200.30
> 192.168.200.20
>
>
> the ctdb lock file is /opt/ctdb/ctdb.lock
> /opt/ctdb/ is a mount point of a glusterfs cluster
> the glusterfs volume :
> [root at node1 ctdb]# gluster v status clusters_volume_ctdb
> Status of volume: clusters_volume_ctdb
> Gluster process TCP Port RDMA Port Online Pid
> ------------------------------------------------------------------------------
> Brick 192.168.200.10:/data/ctdb/192.168.200
> .10 49153 0 Y 6215
> Brick 192.168.200.30:/data/ctdb/192.168.200
> .30 49152 0 Y 17858
> Brick 192.168.200.20:/data/ctdb/192.168.200
> .20 49152 0 Y 9134
>
>
> I have examined the logs of gluster mount point and gluster server nodes and failed to found any anormaly.
>
>
> ctdb status of the node1:
> [root at node1 ctdb]# ctdb status
> Number of nodes:3
> pnn:0 192.168.200.10 UNHEALTHY (THIS NODE)
> pnn:1 192.168.200.30 DISCONNECTED|UNHEALTHY|INACTIVE
> pnn:2 192.168.200.20 DISCONNECTED|UNHEALTHY|INACTIVE
> Generation:INVALID
> Size:3
> hash:0 lmaster:0
> hash:1 lmaster:1
> hash:2 lmaster:2
> Recovery mode:RECOVERY (1)
> Recovery master:0
>
>
> ctdb status of the node2:
> [root at node2 ctdb]# ctdb status
> Number of nodes:3
> pnn:0 192.168.200.10 DISCONNECTED|UNHEALTHY|INACTIVE
> pnn:1 192.168.200.30 DISCONNECTED|UNHEALTHY|INACTIVE
> pnn:2 192.168.200.20 OK (THIS NODE)
> Generation:1475941203
> Size:1
> hash:0 lmaster:2
> Recovery mode:NORMAL (0)
> Recovery master:2
>
>
> ctdb status of node3:
> [root at node3 ~]# ctdb status
> Number of nodes:3
> pnn:0 192.168.200.10 DISCONNECTED|UNHEALTHY|INACTIVE
> pnn:1 192.168.200.30 UNHEALTHY (THIS NODE)
> pnn:2 192.168.200.20 DISCONNECTED|UNHEALTHY|INACTIVE
> Generation:INVALID
> Size:3
> hash:0 lmaster:0
> hash:1 lmaster:1
> hash:2 lmaster:2
> Recovery mode:RECOVERY (1)
> Recovery master:1
The above "ctdb status" output tells you that the CTDB nodes are not
connecting to each other. The logs also do not show the nodes
connecting. I would look here:
https://wiki.samba.org/index.php/Basic_CTDB_configuration#Troubleshooting
Is there a firewall blocking connections to TCP port 4379?
> the ping_pong test results:
> (the cluster is running)
> [root at node1 ~]# ping_pong -l /opt/ctdb/ctdb.lock
> file already locked, calling check_lock to tell us who has it locked:
> check_lock failed: lock held: pid='0', type='1', start='0', len='1'
> Working POSIX byte range locks
>
>
> [root at node2 ~]# ping_pong -l /opt/ctdb/ctdb.lock
> file already locked, calling check_lock to tell us who has it locked:
> check_lock failed: lock held: pid='19142', type='1', start='0', len='1'
> Working POSIX byte range locks
>
>
> [root at node3 ~]# ping_pong -l /opt/ctdb/ctdb.lock
> file already locked, calling check_lock to tell us who has it locked:
> check_lock failed: lock held: pid='0', type='1', start='0', len='1'
> Working POSIX byte range locks
>
>
> I have searched many pages for a long time but failed to solve this problem.
> thanks for any advice.
I'm not sure if there is actually a locking issue. The logs show
contention for the recovery lock, so locking appears to be OK.
I suggest checking why the nodes can't connect to each other via TCP.
As mentioned above, this may be due to a firewall.
By the way, this question really belongs on the "samba" mailing list,
rather than on "samba-technical"... ;-)
peace & happiness,
martin
More information about the samba-technical
mailing list