[SCM] Samba Shared Repository - branch master updated

Wed Nov 6 02:47:02 UTC 2019

The branch, master has been updated
       via  bf99f820778 ctdb-tests: Make process exists test more resilient
       via  dd9d5ec5c8d ctdb-tests: Improve code quality in ctdb_init()
       via  3b5ed00054e ctdb-tests: No longer retry starting the cluster
       via  bf47bc18bb8 ctdb-tcp: Drop tracking of file descriptor for incoming connections
       via  d0baad257e5 ctdb-tcp: Avoid orphaning the TCP incoming queue
       via  e62b3a05a87 ctdb-tcp: Check incoming queue to see if incoming connection is up
      from  ff47cc661d4 s3: libsmb: Ensure return from net_share_enum_rpc() sets cli->raw_status on error.

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit bf99f82077876fc107adb1ed4f07dd7e1351fff9
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Nov 5 15:34:18 2019 +1100

    ctdb-tests: Make process exists test more resilient
    
    This can fail as follows:
    
    --==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--
    Running test ./tests/UNIT/tool/ctdb.process-exists.003.sh (02:26:30)
    --==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--
    ctdb.process-exists.003      - ctdbd process with multiple connections on node 0
    Setting up fake ctdbd
    <10||0|
    OK
    <10|PID 26107 exists
    |0|
    OK
    ==================================================
    Running "ctdb -d NOTICE process-exists 26107 0x1234567812345678"
    PASSED
    ==================================================
    Running "ctdb -d NOTICE process-exists 26107 0xaebbccdd12345678"
    Registered SRVID 0xaebbccdd12345678
    --------------------------------------------------
    Output (Exit status: 1):
    --------------------------------------------------
    PID 26107 with SRVID 0xaebbccdd12345678 does not exist
    --------------------------------------------------
    Required output (Exit status: 0):
    --------------------------------------------------
    PID 26107 with SRVID 0xaebbccdd12345678 exists
    
    FAILED
    connection to daemon closed, exiting
    ==========================================================================
    TEST FAILED: ./tests/UNIT/tool/ctdb.process-exists.003.sh (status 1) (duration: 0s)
    ==========================================================================
    
    This happens when dummy_client has not registered the SRVID (for its
    10th connection) before the 2nd simple_test.
    
    Change the initial wait to ensure that the SRVID is registered.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Wed Nov  6 02:46:24 UTC 2019 on sn-devel-184

commit dd9d5ec5c8de82f343369ec4b26c0f24d465ce24
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Oct 29 15:22:38 2019 +1100

    ctdb-tests: Improve code quality in ctdb_init()
    
    Improve quoting and indentation.  Print a clear error if the cluster
    goes back into recovery and doesn't come back out.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 3b5ed00054e7d1e34cf33706c670e17a188c67d1
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Oct 29 15:11:31 2019 +1100

    ctdb-tests: No longer retry starting the cluster
    
    Retrying like this hides bugs.  The cluster should come up first time,
    every time.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit bf47bc18bb8a94231870ef821c0352b7a15c2e28
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Oct 29 17:28:22 2019 +1100

    ctdb-tcp: Drop tracking of file descriptor for incoming connections
    
    This file descriptor is owned by the incoming queue.  It will be
    closed when the queue is torn down.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
    RN: Avoid communication breakdown on node reconnect
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit d0baad257e511280ff3e5c7372c38c43df841070
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Oct 29 15:29:11 2019 +1100

    ctdb-tcp: Avoid orphaning the TCP incoming queue
    
    CTDB's incoming queue handling does not check whether an existing
    queue exists, so can overwrite the pointer to the queue.  This used to
    be harmless until commit c68b6f96f26664459187ab2fbd56767fb31767e0
    changed the read callback to use a parent structure as the callback
    data.  Instead of cleaning up an orphaned queue on disconnect, as
    before, this will now free the new queue.
    
    At first glance it doesn't seem possible that 2 incoming connections
    from the same node could be processed before the intervening
    disconnect.  However, the incoming connections and disconnect occur on
    different file descriptors.  The queue can become orphaned on node A
    when the following sequence occurs:
    
    1. Node A comes up
    2. Node A accepts an incoming connection from node B
    3. Node B processes a timeout before noticing that outgoing the queue is writable
    4. Node B tears down the outgoing connection to node A
    5. Node B initiates a new connection to node A
    6. Node A accepts an incoming connection from node B
    
    Node A processes then the disconnect of the old incoming connection
    from (2) but tears down the new incoming connection from (6).  This
    then occurs until the originally affected node is restarted.
    
    However, due to the number of outgoing connection attempts and
    associated teardowns, this induces the same behaviour on the
    corresponding incoming queue on all nodes that node A attempts to
    connect to.  Therefore, other nodes become affected and need to be
    restarted too.
    
    As a result, the whole cluster probably needs to be restarted to
    recover from this situation.
    
    The problem can occur any time CTDB is started on a node.
    
    The fix is to avoid accepting new incoming connections when a queue
    for incoming connections is already present.  The connecting node will
    simply retry establishing its outgoing connection.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit e62b3a05a874db13a848573d2e2fb1c157393b9c
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Oct 29 15:25:26 2019 +1100

    ctdb-tcp: Check incoming queue to see if incoming connection is up
    
    This makes it consistent with the reverse case.  Also, in_fd will soon
    be removed.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/tcp/ctdb_tcp.h                             |  1 -
 ctdb/tcp/tcp_connect.c                          | 11 ++++--
 ctdb/tcp/tcp_init.c                             |  6 ----
 ctdb/tcp/tcp_io.c                               |  2 --
 ctdb/tests/UNIT/tool/ctdb.process-exists.003.sh |  2 +-
 ctdb/tests/scripts/integration.bash             | 48 +++++++------------------
 6 files changed, 21 insertions(+), 49 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/tcp/ctdb_tcp.h b/ctdb/tcp/ctdb_tcp.h
index 9a615fc6393..daabad74297 100644
--- a/ctdb/tcp/ctdb_tcp.h
+++ b/ctdb/tcp/ctdb_tcp.h
@@ -37,7 +37,6 @@ struct ctdb_tcp_node {
 	struct tevent_timer *connect_te;
 
 	struct ctdb_context *ctdb;
-	int in_fd;
 	struct ctdb_queue *in_queue;
 };
 
diff --git a/ctdb/tcp/tcp_connect.c b/ctdb/tcp/tcp_connect.c
index 6123380ca9f..a75f35a809e 100644
--- a/ctdb/tcp/tcp_connect.c
+++ b/ctdb/tcp/tcp_connect.c
@@ -153,7 +153,7 @@ static void ctdb_node_connect_write(struct tevent_context *ev,
 	 * as connected, but only if the corresponding listening
 	 * socket is also connected
 	 */
-	if (tnode->in_fd != -1) {
+	if (tnode->in_queue != NULL) {
 		node->ctdb->upcalls->node_connected(node);
 	}
 }
@@ -312,6 +312,13 @@ static void ctdb_listen_event(struct tevent_context *ev, struct tevent_fd *fde,
 		return;
 	}
 
+	if (tnode->in_queue != NULL) {
+		DBG_ERR("Incoming queue active, rejecting connection from %s\n",
+			ctdb_addr_to_str(&addr));
+		close(fd);
+		return;
+	}
+
 	ret = set_blocking(fd, false);
 	if (ret != 0) {
 		DBG_ERR("Failed to set socket non-blocking (%s)\n",
@@ -348,8 +355,6 @@ static void ctdb_listen_event(struct tevent_context *ev, struct tevent_fd *fde,
 		return;
 	}
 
-	tnode->in_fd = fd;
-
        /*
 	* Mark the connecting node as connected, but only if the
 	* corresponding outbound connected is also up
diff --git a/ctdb/tcp/tcp_init.c b/ctdb/tcp/tcp_init.c
index a9cb9b36a01..3be16bd7d5e 100644
--- a/ctdb/tcp/tcp_init.c
+++ b/ctdb/tcp/tcp_init.c
@@ -43,11 +43,6 @@ static int tnode_destructor(struct ctdb_tcp_node *tnode)
 		tnode->out_fd = -1;
 	}
 
-	if (tnode->in_fd != -1) {
-		close(tnode->in_fd);
-		tnode->in_fd = -1;
-	}
-
 	return 0;
 }
 
@@ -61,7 +56,6 @@ static int ctdb_tcp_add_node(struct ctdb_node *node)
 	CTDB_NO_MEMORY(node->ctdb, tnode);
 
 	tnode->out_fd = -1;
-	tnode->in_fd = -1;
 	tnode->ctdb = node->ctdb;
 
 	node->private_data = tnode;
diff --git a/ctdb/tcp/tcp_io.c b/ctdb/tcp/tcp_io.c
index e8ebff887e1..2d8ec0f7062 100644
--- a/ctdb/tcp/tcp_io.c
+++ b/ctdb/tcp/tcp_io.c
@@ -76,8 +76,6 @@ void ctdb_tcp_read_cb(uint8_t *data, size_t cnt, void *args)
 
 failed:
 	TALLOC_FREE(tnode->in_queue);
-	close(tnode->in_fd);
-	tnode->in_fd = -1;
 	node->ctdb->upcalls->node_dead(node);
 
 	TALLOC_FREE(data);
diff --git a/ctdb/tests/UNIT/tool/ctdb.process-exists.003.sh b/ctdb/tests/UNIT/tool/ctdb.process-exists.003.sh
index 29c42a1a627..402b0d6f5ab 100755
--- a/ctdb/tests/UNIT/tool/ctdb.process-exists.003.sh
+++ b/ctdb/tests/UNIT/tool/ctdb.process-exists.003.sh
@@ -16,7 +16,7 @@ srvid="0xaebbccdd12345678"
 dummy_client -d INFO -s "$ctdbd_socket" -n 10 -S "$srvid" &
 pid=$!
 
-wait_until 10 $CTDB process-exists "$pid"
+wait_until 10 $CTDB process-exists "$pid" "$srvid"
 
 srvid2="0x1234567812345678"
 required_result 1 "PID $pid with SRVID $srvid2 does not exist"
diff --git a/ctdb/tests/scripts/integration.bash b/ctdb/tests/scripts/integration.bash
index d5bd542ab00..51e9c7cb822 100644
--- a/ctdb/tests/scripts/integration.bash
+++ b/ctdb/tests/scripts/integration.bash
@@ -568,59 +568,35 @@ wait_until_node_has_no_ips ()
 
 ctdb_init ()
 {
-    local i
-    for i in $(seq 1 5) ; do
 	ctdb_stop_all >/dev/null 2>&1 || :
-	ctdb_start_all || {
-	    echo "Start failed.  Trying again in a few seconds..."
-	    sleep_for 5
-	    continue
-	}
 
-	wait_until_ready || {
-	    echo "Cluster didn't become ready.  Restarting..."
-	    continue
-	}
+	ctdb_start_all || ctdb_test_error "Cluster start failed"
+
+	wait_until_ready || ctdb_test_error "Cluster didn't become ready"
 
 	echo "Setting RerecoveryTimeout to 1"
 	onnode -pq all "$CTDB setvar RerecoveryTimeout 1"
 
-	# In recent versions of CTDB, forcing a recovery like this
-	# blocks until the recovery is complete.  Hopefully this will
-	# help the cluster to stabilise before a subsequent test.
 	echo "Forcing a recovery..."
-	onnode -q 0 $CTDB recover
+	onnode -q 0 "$CTDB recover"
 	sleep_for 2
 
-	if ! onnode -q any $CTDB_TEST_WRAPPER _cluster_is_recovered ; then
-	    echo "Cluster has gone into recovery again, waiting..."
-	    wait_until 30/2 onnode -q any $CTDB_TEST_WRAPPER _cluster_is_recovered
+	if ! onnode -q all "$CTDB_TEST_WRAPPER _cluster_is_recovered" ; then
+		echo "Cluster has gone into recovery again, waiting..."
+		wait_until 30/2 onnode -q all \
+			   "$CTDB_TEST_WRAPPER _cluster_is_recovered" || \
+			ctdb_test_error "Cluster did not come out of recovery"
 	fi
 
-
-	# Cluster is still healthy.  Good, we're done!
-	if ! onnode 0 $CTDB_TEST_WRAPPER _cluster_is_healthy ; then
-	    echo "Cluster became UNHEALTHY again [$(date)]"
-	    onnode -p all ctdb status -X 2>&1
-	    onnode -p all ctdb scriptstatus 2>&1
-	    echo "Restarting..."
-	    continue
+	if ! onnode 0 "$CTDB_TEST_WRAPPER _cluster_is_healthy" ; then
+		ctdb_test_error "Cluster became UNHEALTHY again [$(date)]"
 	fi
 
 	echo "Doing a sync..."
-	onnode -q 0 $CTDB sync
+	onnode -q 0 "$CTDB sync"
 
 	echo "ctdb is ready"
 	return 0
-    done
-
-    echo "Cluster UNHEALTHY...  too many attempts..."
-    onnode -p all ctdb status -X 2>&1
-    onnode -p all ctdb scriptstatus 2>&1
-
-    # Try to make the calling test fail
-    status=1
-    return 1
 }
 
 ctdb_base_show ()


-- 
Samba Shared Repository