[Samba] ctdb tcp kill: remaining connections

Martin Schwenke martin at meltin.net
Thu May 29 00:40:31 UTC 2025


Hi Uli,

On Wed, 28 May 2025 13:12:29 +0000, Ulrich Sibiller
<ulrich.sibiller at eviden.com> wrote:

> We are exporting GPFS filesystems with NFSv3 via ctdb. Today I have
> stopped ctdb on one node, and the IPs got automatically moved to
> another node. This is something that always works like a charm.
> However, many NFS clients started complaining very soon, a phenomenon
> that we see very often:

> [Mi Mai 28 13:55:24 2025] lockd: server 192.168.xxx.yy not responding, still trying
> [Mi Mai 28 13:55:24 2025] lockd: server 192.168.xxx.yy not responding, still trying
> [Mi Mai 28 13:55:24 2025] lockd: server 192.168.xxx.yy not responding, still trying
> 
> As the IP hat already been taken over this time I manually ran ctdb
> gratarp 192.168.xxx.yy on the server that had taken the IP and almost
> immediately the clients were fine again.

> So I checked the 10.interface script and found that gratarp is only
> done in the updateip case, but not in the takeip case. As we have
> extended that script with some (more) logging I can say that we never
> see updateip being called. We only see "monitor" and "takeip" and
> "releaseip". The thing is that takeip never sends the gratarp. Even
> in the most current ctdb it does not (see
> https://gitlab.com/samba-team/samba/-/blame/master/ctdb/config/events/legacy/10.interface.script?ref_type=heads#L137).
> So I am wondering why not? 

In the "takeip" case the gratarp is sent by the daemon.  The relevant
code is in ctdb_announce_vnn_iface().

I was going to ask questions about firewalls and forwarding rules on
routers between the server nodes and the clients.  However, you can run
"ctdb gratarp ..." and it fixes the problem, so it doesn't sound like
the packets are being filtered somewhere.  The "ctdb gratarp" command
sends a control to the daemon to send the ARPs and, for the control, the
daemon runs the same low-level code as in ctdb_announce_vnn_iface(),
which is run from the callback when the "takeip" event succeeds.

There are 2 possible differences:

* The interface:

  The "takeip" callback automatically determines the interface from
  which to send the ARPs.  This is simply based on the interface to
  which the IP is assigned.  I doubt you would be running a manual
  command that specifies a different interface.

* Timing/routing:

  Not that it should matter, but are you using the 13.per_ip_routing
  event script to add source-based routing?  If so, I'm wondering if
  perhaps something is going wrong there during "takeip" and is being
  fixed later in "ipreallocated".

  Is anything strange about your routing?  In fact, are the clients on
  the same subnet as the server nodes?  It shouldn't matter if
  everything is setup sanely.

One other thing I notice in the relevant (lockd client) kernel code is
that it calls:

  rpc_force_rebind(clnt);

after logging the "not responding, still trying" message.  Without
digging very deep, that looks like it should be forcing the client to
reconnect.  So, in that case too, we need to be sure the ARPs are
making it through to the client.

It would be good if you could tcpdump on the server nodes and on a
client to determine if the ARPs are being sent... and what is happening
to the lockd connections before you intervene manually.  You should be
able to construct a filter that captures only relevant gratuitous ARPs
and TCP SYN packets - if so, you could leave that running in the
background.

Thanks...

peace & happiness,
martin



More information about the samba mailing list