[Samba] ctdb tcp kill: remaining connections

Fri Feb 17 18:07:17 UTC 2023

> From "Takeover run starting" to "Takeover run completed successfully"
> in the logs.

Feb 13 12:27:36 serverpnn0 ctdb-recoverd[24951]: Disabling takeover runs for 60 seconds
Feb 13 12:27:36 serverpnn1 ctdb-recoverd[85688]: Takeover run starting
Feb 13 12:27:36 serverpnn2 ctdb-recoverd[29692]: Disabling takeover runs for 60 seconds
Feb 13 12:27:36 serverpnn3 ctdb-recoverd[3482705]: Disabling takeover runs for 60 seconds
Feb 13 12:27:46 serverpnn0 ctdb-recoverd[24951]: Reenabling takeover runs
Feb 13 12:27:46 serverpnn1 ctdb-recoverd[85688]: Takeover run completed successfully

-> 10s

Feb 13 12:27:46 serverpnn2 ctdb-recoverd[29692]: Reenabling takeover runs
Feb 13 12:27:46 serverpnn3 ctdb-recoverd[3482705]: Reenabling takeover runs
Feb 13 12:27:50 serverpnn0 ctdb-recoverd[24951]: Disabling takeover runs for 60 seconds

Feb 13 12:27:50 serverpnn1 ctdb-recoverd[85688]: Takeover run starting
Feb 13 12:27:50 serverpnn2 ctdb-recoverd[29692]: Disabling takeover runs for 60 seconds
Feb 13 12:27:50 serverpnn3 ctdb-recoverd[3482705]: Disabling takeover runs for 60 seconds
Feb 13 12:27:56 serverpnn0 ctdb-recoverd[24951]: Reenabling takeover runs
Feb 13 12:27:56 serverpnn1 ctdb-recoverd[85688]: Takeover run completed successfully

->6s

Feb 13 12:27:56 serverpnn2 ctdb-recoverd[29692]: Reenabling takeover runs
Feb 13 12:27:56 serverpnn3 ctdb-recoverd[3482705]: Reenabling takeover runs

> This means the problem must be one of 2 things:
> 
> 2. ctdb_killtcp is not terminating all connections.
> 
> So, I think it is (2).  :-)

Yes, I think so, too.

> support capture on InfiniBand networks. It might be worth building a
> static (to avoid shared library issues) ctdb_killtcp binary from 4.18.x
> source (after configuring with --enable-pcap) and installing that in
> place of the existing ctdb_killtcp binary.  The version in 4.18 also has
> slightly better debugging for unknown packets.

Sounds promising! I will try that! I have not found enough time today to do the compile, will do that next week.

 > > We are using two teaming-Interfaces (one for each LAN). Each
> > team-Interface has 2x10Gbit LACP.
> 
> I'm not sure how this would affect the packet capture.  I would guess
> that the frame format would be one of the ones that is now supported. 

It should be transparent to the consumers.

> Fair enough... but a whole minute is a long time to be running
> ctdb_killtcp during failover...

$ grep "Feb 13 17:36:08" /var/log/messages | grep -i ignor | wc -l
7293

I should have written "one run" or similar, all these lines have been logged (with additional debug lines added by me) with the same timestamp and show up between:

Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0: blocking [z.z.139.15] on interface [team0]
Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: ip_block z.z.139.15 team0
Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0: killing tcp connections for ip [z.z.139.15] on interface [team0]
Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: killcount [22] twoway: [22]
Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: reset_connections_send: Adding 44 connections to hash

and

Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: Killed 22/22 TCP connections to released IP z.z.139.15
Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0: deleting ip [z.z.139.15] from interface [team0] maskbits [16]
Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0: unblocking [z.z.139.15] on interface [team0]
Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: ip_unblock z.z.139.15 team0
Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0: flushing route cache
Feb 13 17:36:08 serverpnn2 ctdbd[29605]: Sending RELEASE_IP message for z.z.139.15

In that very same second we see these message counts on the node running ctdb_killtcp:
$ grep "Feb 13 17:36:08" /var/log/messages | cut -d: -f4- | sed 's/192.*$//' | sort | uniq -c
      1  10.interface.debug: ctdb_sys_open_capture_socket: Created RAW SOCKET FD:5 for tcp tickle
   1808  10.interface.debug: Ignoring packet: 
      1  10.interface.debug: ip_block <<< DEBUG output added by me
      1  10.interface.debug: ip_unblock <<< DEBUG output added by me
      1  10.interface.debug: killcount [22] twoway: [22] <<< DEBUG output added by me
      1  10.interface.debug: Killed 22/22 TCP connections to released IP 
   5485  10.interface.debug: reset_connections_capture_tcp_handler: Ignoring packet for unknown connection: 
      1  10.interface.debug: reset_connections_send: Adding 44 connections to hash
     44  10.interface.debug: reset_connections_send: Adding connection to hash: 
    905  10.interface.debug: reset_connections_tickle_connection: Sending tickle ACK for connection '
      1  10.interface.debug: running /etc/ctdb/events/legacy/10.interface.debug.script with arguments: releaseip team0  <<< DEBUG output added by me
     28  10.interface.debug: Sending a TCP RST to for connection 
      1  10.interface.debug: team0: blocking [      <<< DEBUG output added by me
      1  10.interface.debug: team0: deleting ip [   <<< DEBUG output added by me
      1  10.interface.debug: team0: flushing route cache   <<< DEBUG output added by me
      1  10.interface.debug: team0: killing tcp connections for ip [   <<< DEBUG output added by me
      1  10.interface.debug: team0: unblocking [      <<< DEBUG output added by me
      1  ../../ctdb/server/ctdb_takeover.c:295 public address '
      1  Sending RELEASE_IP message for

So in this case only 22 connections to be killed (two-way kill) caused more than 7000 lines of debugging output...
(BTW, there's a typo in  cdtb_killtcp: " Sending a TCP RST to for connection" - one of "to" and "for" is obsolete 

> In preparation for takeover, for NFS we only remember connections on
> 2049.  The other ports, such as lockd, would be handled on reboot
> (rather than crash) by the releasing side... but I doubt anyone has
> thought about this!  Perhaps we seen that part handling by UDP in the
> past?  Please feel free to open a bug to remind me to look at this.

Ok, I will.

Your comment is interesting because it indicates ctdb might be designed with the idea of always rebooting after a failover in mind. We are also using ctdb moveip or a systemctl stop ctdb to perform updates without reboot (e.g. for gpfs). So are we "holding it wrong"?

> I suspect this could be improved to say "Never mount the same NFS
> directory ...".  I think you are OK with mounting the subdirectories
> for each user.

Good!

Thanks,

Uli