ctdb 4.11.2 version failed to recover
Martin Schwenke
martin at meltin.net
Tue Dec 3 03:23:21 UTC 2019
Hi,
On Fri, 29 Nov 2019 07:20:59 +0000, 耿纪超 via samba-technical
<samba-technical at lists.samba.org> wrote:
> I use the ctdb of 4.11.2 version, include the newest patch(https://bugzilla.samba.org/show_bug.cgi?id=14175). But,when I test the NIC exception, I encounter a problem, the ctdb cluster can not
> Recover Normal.
> The test steps are as follows:
>
> 1、 ctdb cluster have two ndoes, nodeA and nodeB,the cluster status is OK。
>
> 2、 Ifdown the nodeA and nodeB’s NIC, which config private IP.
>
> 3、 After 25 seconds, nodeA and nodeB detech each other dead,then call the functions: ctdb_tcp_restart->ctdb_tcp_node_connect,
> but bind failed, print the log:
>
> node 10.240.226.211:4379 is dead: 0 connected
>
> Tearing down connection to dead node :1
>
> Failed to bind socket (Cannot assign requested address)
It really depends what you are trying to test and how you are doing
it...
I am wondering if you are using:
* ifdown <device> (which unassigned the IP address)
* ip link set <device> down (or ifconfig <device> down)
The first of these definitely does not test anything like a
hardware/link failure. Normally, if a link goes down the IP address
will stay on the interface. This case is much more likely than the
case where an admin accidentally takes down the wrong interface.
> solution:
> when bind failed, no one will reestablish connections, even the NIC is up. I think when bind failed, we should use the time to retry. The patch is follow, I test it work well。
> --- a/ctdb/tcp/tcp_connect.c
> +++ b/ctdb/tcp/tcp_connect.c
> @@ -236,6 +236,11 @@ void ctdb_tcp_node_connect(struct tevent_context *ev, struct tevent_timer *te,
> DBG_ERR("Failed to bind socket (%s)\n", strerror(errno));
> close(tnode->out_fd);
> tnode->out_fd = -1;
> + tnode->connect_te = tevent_add_timer(ctdb->ev,
> + tnode,
> + timeval_current_ofs(5, 0),
> + ctdb_tcp_node_connect,
> + node);
> return;
> }
So, while you have identified a situation from which ctdbd does not
recover and provided a possible fix, I would like to understand what
you are trying to test before we agree on the best fix. ;-)
Thanks...
peace & happiness,
martin
More information about the samba-technical
mailing list