ctdb cluster not healthy:"Unable to take recovery lock - contention"

Sat May 8 09:44:43 UTC 2021

sorry that my attachments are too large.
and my ctdb version is 4.8.5

At 2021-05-08 16:34:57, "风无名" <wuming_81 at 163.com> wrote:

hello, everyone. 
after I started my ctdb cluster many minutes , my cluter  are still not healthy.
the logs are in the attachment.
my cluster consists of three nodes. /etc/hosts file:
192.168.200.10 node1
192.168.200.20 node2
192.168.200.30 node3

public address config file:
192.168.210.10/24 ens15f1
192.168.210.30/24 ens15f1
192.168.210.20/24 ens15f1

nodes config file:
192.168.200.10
192.168.200.30
192.168.200.20

the ctdb lock file is /opt/ctdb/ctdb.lock
/opt/ctdb/ is a mount point of a glusterfs cluster
the glusterfs volume :
[root at node1 ctdb]# gluster v  status clusters_volume_ctdb
Status of volume: clusters_volume_ctdb
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.200.10:/data/ctdb/192.168.200
.10                                         49153     0          Y       6215 
Brick 192.168.200.30:/data/ctdb/192.168.200
.30                                         49152     0          Y       17858
Brick 192.168.200.20:/data/ctdb/192.168.200
.20                                         49152     0          Y       9134 

I have examined the logs of gluster mount point and gluster server nodes and failed to found any anormaly.

ctdb status of the node1:
[root at node1 ctdb]# ctdb status
Number of nodes:3
pnn:0 192.168.200.10   UNHEALTHY (THIS NODE)
pnn:1 192.168.200.30   DISCONNECTED|UNHEALTHY|INACTIVE
pnn:2 192.168.200.20   DISCONNECTED|UNHEALTHY|INACTIVE
Generation:INVALID
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:RECOVERY (1)
Recovery master:0

ctdb status of the node2:
[root at node2 ctdb]# ctdb status
Number of nodes:3
pnn:0 192.168.200.10   DISCONNECTED|UNHEALTHY|INACTIVE
pnn:1 192.168.200.30   DISCONNECTED|UNHEALTHY|INACTIVE
pnn:2 192.168.200.20   OK (THIS NODE)
Generation:1475941203
Size:1
hash:0 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:2

ctdb status of node3:
[root at node3 ~]# ctdb status
Number of nodes:3
pnn:0 192.168.200.10   DISCONNECTED|UNHEALTHY|INACTIVE
pnn:1 192.168.200.30   UNHEALTHY (THIS NODE)
pnn:2 192.168.200.20   DISCONNECTED|UNHEALTHY|INACTIVE
Generation:INVALID
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:RECOVERY (1)
Recovery master:1

the ping_pong test results:
(the cluster is running)
[root at node1 ~]# ping_pong -l  /opt/ctdb/ctdb.lock 
file already locked, calling check_lock to tell us who has it locked:
check_lock failed: lock held: pid='0', type='1', start='0', len='1'
Working POSIX byte range locks

[root at node2 ~]#  ping_pong -l  /opt/ctdb/ctdb.lock
file already locked, calling check_lock to tell us who has it locked:
check_lock failed: lock held: pid='19142', type='1', start='0', len='1'
Working POSIX byte range locks

[root at node3 ~]#  ping_pong -l  /opt/ctdb/ctdb.lock
file already locked, calling check_lock to tell us who has it locked:
check_lock failed: lock held: pid='0', type='1', start='0', len='1'
Working POSIX byte range locks

I have searched many pages for a long time but failed to solve this problem.
thanks for any advice.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: node1-log.ctdb
Type: application/octet-stream
Size: 101467 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20210508/113e359d/node1-log-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node2-log.ctdb
Type: application/octet-stream
Size: 163841 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20210508/113e359d/node2-log-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node3-log.ctdb
Type: application/octet-stream
Size: 73339 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20210508/113e359d/node3-log-0001.obj>