ctdb cluster not healthy:"Unable to take recovery lock - contention"
风无名
wuming_81 at 163.com
Sat May 8 09:44:43 UTC 2021
sorry that my attachments are too large.
and my ctdb version is 4.8.5
At 2021-05-08 16:34:57, "风无名" <wuming_81 at 163.com> wrote:
hello, everyone.
after I started my ctdb cluster many minutes , my cluter are still not healthy.
the logs are in the attachment.
my cluster consists of three nodes. /etc/hosts file:
192.168.200.10 node1
192.168.200.20 node2
192.168.200.30 node3
public address config file:
192.168.210.10/24 ens15f1
192.168.210.30/24 ens15f1
192.168.210.20/24 ens15f1
nodes config file:
192.168.200.10
192.168.200.30
192.168.200.20
the ctdb lock file is /opt/ctdb/ctdb.lock
/opt/ctdb/ is a mount point of a glusterfs cluster
the glusterfs volume :
[root at node1 ctdb]# gluster v status clusters_volume_ctdb
Status of volume: clusters_volume_ctdb
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.200.10:/data/ctdb/192.168.200
.10 49153 0 Y 6215
Brick 192.168.200.30:/data/ctdb/192.168.200
.30 49152 0 Y 17858
Brick 192.168.200.20:/data/ctdb/192.168.200
.20 49152 0 Y 9134
I have examined the logs of gluster mount point and gluster server nodes and failed to found any anormaly.
ctdb status of the node1:
[root at node1 ctdb]# ctdb status
Number of nodes:3
pnn:0 192.168.200.10 UNHEALTHY (THIS NODE)
pnn:1 192.168.200.30 DISCONNECTED|UNHEALTHY|INACTIVE
pnn:2 192.168.200.20 DISCONNECTED|UNHEALTHY|INACTIVE
Generation:INVALID
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:RECOVERY (1)
Recovery master:0
ctdb status of the node2:
[root at node2 ctdb]# ctdb status
Number of nodes:3
pnn:0 192.168.200.10 DISCONNECTED|UNHEALTHY|INACTIVE
pnn:1 192.168.200.30 DISCONNECTED|UNHEALTHY|INACTIVE
pnn:2 192.168.200.20 OK (THIS NODE)
Generation:1475941203
Size:1
hash:0 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:2
ctdb status of node3:
[root at node3 ~]# ctdb status
Number of nodes:3
pnn:0 192.168.200.10 DISCONNECTED|UNHEALTHY|INACTIVE
pnn:1 192.168.200.30 UNHEALTHY (THIS NODE)
pnn:2 192.168.200.20 DISCONNECTED|UNHEALTHY|INACTIVE
Generation:INVALID
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:RECOVERY (1)
Recovery master:1
the ping_pong test results:
(the cluster is running)
[root at node1 ~]# ping_pong -l /opt/ctdb/ctdb.lock
file already locked, calling check_lock to tell us who has it locked:
check_lock failed: lock held: pid='0', type='1', start='0', len='1'
Working POSIX byte range locks
[root at node2 ~]# ping_pong -l /opt/ctdb/ctdb.lock
file already locked, calling check_lock to tell us who has it locked:
check_lock failed: lock held: pid='19142', type='1', start='0', len='1'
Working POSIX byte range locks
[root at node3 ~]# ping_pong -l /opt/ctdb/ctdb.lock
file already locked, calling check_lock to tell us who has it locked:
check_lock failed: lock held: pid='0', type='1', start='0', len='1'
Working POSIX byte range locks
I have searched many pages for a long time but failed to solve this problem.
thanks for any advice.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node1-log.ctdb
Type: application/octet-stream
Size: 101467 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20210508/113e359d/node1-log-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node2-log.ctdb
Type: application/octet-stream
Size: 163841 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20210508/113e359d/node2-log-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node3-log.ctdb
Type: application/octet-stream
Size: 73339 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20210508/113e359d/node3-log-0001.obj>
More information about the samba-technical
mailing list