[Samba] glusterfs + ctdb + nfs-ganesha , unplug the network cable of serving node, takes around ~20 mins for IO to resume
Martin Schwenke
martin at meltin.net
Mon Mar 4 05:54:56 UTC 2019
Hi Dan,
On Mon, 25 Feb 2019 02:43:31 +0000, "Liu, Dan via samba"
<samba at lists.samba.org> wrote:
> We did some failover/failback tests on 2 nodes(A and B) with
> architecture 'glusterfs + ctdb(public address) + nfs-ganesha'。
>
> 1st:
> During write, unplug the network cable of serving node A
> ->NFS Client took a few seconds to recover to conitinue writing.
>
> After some minutes, plug the network cable of serving node A
> ->NFS Client also took a few seconds to recover to conitinue
> writing.
>
> 2nd:
> During write, unplug the network cable of serving node A
> ->NFS Client took 20 minutes to recover to conitinue writing.
> It is too slow for clients to accept the recovery time。
Definitely! What was different between "1st" and "2nd"? Were they
testing different scenarios?
> From CTDB log, during failover and failback, fail node failed to kill
> the connection with client while recovery node failed to send ‘tickle
> ack’to client to re-established connection.
The first really isn't a problem. I'm not sure why CTDB attempts to do
a 2 way kill from the releasing node. We're going to stop doing that
in the future.
The 2nd is a mystery. Are you sure the network connection on node B
was up? This message seems to indicate the network is down:
2019/02/22 18:01:03.065121 ctdbd[29541]: Failed sendto (No route to host) 2019/02/22 18:01:03.065191 ctdbd[29541]: ../ctdb/server/ctdb_takeover.c:388 Failed to send tcp tickle ack for ::ffff:10.10.11.18
peace & happiness,
martin
More information about the samba
mailing list