[SCM] Samba Shared Repository - branch master updated

Wed Aug 21 00:05:02 UTC 2024

The branch, master has been updated
       via  ffc75c569c6 lib/param: Don't treat a missing include file as an error in handle_include().
       via  578dfa57651 ctdb-scripts: Avoid flapping NFS services at startup
       via  18a29ed3672 ctdb-scripts: Make initial statistics output empty
       via  032b7b49c9f ctdb-scripts: Only consider statistics on timeout
       via  f7a96deafa9 ctdb-tests: Make _rpc_service_up() and _rpc_services_down() internal
       via  0919701a68b ctdb-tests: Make NFS RPC monitoring tests consistent
       via  47c33a2442c ctdb-tests: Drop unnecessarily "else"
       via  8b2f228198b ctdb-tests: Replace implicit healthy behaviour with early exits
       via  a522864138a ctdb-tests: Simplify handling of statistics change
       via  084a69d5522 ctdb-tests: Move result check to rpc_set_service_failure_response()
       via  47540012009 ctdb-tests: Initialise return code file
       via  833deb067d8 ctdb-tests: Add function rpc_failure() to log failures and warnings
       via  1d9661d587f ctdb-tests: Argument 3 to nfs_iterate_test() is up iteration
       via  7c5e7080015 ctdb-tests: nfs_iterate_test() marks RPC service down
      from  8edb1fd13c1 ctdb-tcp: Remove a use of ctdb_addr_to_str()

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit ffc75c569c69ce22a39b5d1df8cb4906095c8654
Author: Pavel Filipenský <pfilipensky at samba.org>
Date:   Tue Aug 20 17:15:46 2024 +0200

    lib/param: Don't treat a missing include file as an error in handle_include().
    
    Same fix as in commit 09d7690
    
    'samba-tool domain provision -d10' fails if the included file does not
    exist:
    
    lpcfg_load: refreshing parameters from /etc/samba/smb.conf
    Processing section "[global]"
    Can't find include file /etc/samba/usershares.conf
    pm_process() returned No
    ERROR: Unable to load default file
      File "/usr/lib64/python3.12/site-packages/samba/netcmd/domain/provision.py", line 183, in run
        lp = sambaopts.get_loadparm()
             ^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.12/site-packages/samba/getopt.py", line 282, in get_loadparm
        self._lp.load_default()
    
    Signed-off-by: Pavel Filipenský <pfilipensky at samba.org>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    Autobuild-User(master): Martin Schwenke <martins at samba.org>
    Autobuild-Date(master): Wed Aug 21 00:04:19 UTC 2024 on atb-devel-224

commit 578dfa576517b10d979c9aef539ac910b2f95381
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Sat Jun 29 12:25:59 2024 +1000

    ctdb-scripts: Avoid flapping NFS services at startup
    
    If an NFS service check is set to, say, unhealthy_after=2 then it will
    always switch from the (default startup) unhealthy state to healthy,
    even if there is a fatal problem.  If all services/scripts appear OK
    then the node will become healthy.  When the counter hits the limit it
    will return to unhealthy.  This is misleading.
    
    Instead, never use the counter at startup, until the service becomes
    healthy.  This stops services flapping unhealthy-healthy-unhealthy.
    
    A side-effect is that a service that starts in a broken state will
    never be restarted to try to fix the problem.  This makes sense.  The
    counting and restarting really exist to deal with problems that might
    occur under load.  The first monitor events occur before public IPs
    are hosted, so there can be no load.  If a service doesn't start
    reliably the first time then the admin probably wants to know about
    it.
    
    nfs_iterate_test() is updated to run an initial monitor event to mark
    the services as healthy.  This initialises the counter so it can be
    used for the important part of the test.  Passing the -i option avoids
    running the extra monitor event, so the first iteration will be the
    initial monitor event.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 18a29ed367278849889a846bb93f49afd0c045a8
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Sat Jun 29 19:24:25 2024 +1000

    ctdb-scripts: Make initial statistics output empty
    
    This makes initial failure to retrieve statistics less likely to
    result in a statistics change.  To help with this, statistics
    retrieval stderr now goes to the log - only stdout goes to the file.
    
    This means that the test code for checking statistics changes needs to
    be redone to actually run the statistics command and check.  As with
    rpcinfo output, this output needs to behave as deterministically in
    the test code as it done in the event script.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 032b7b49c9f50fa8e4e049d066e5f1ddb6295d89
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Sun Jun 30 10:35:09 2024 +1000

    ctdb-scripts: Only consider statistics on timeout
    
    Checking statistics is only really relevant to timeouts.  That is, if
    an rpcinfo times out it is worth checking if the service making
    progress.  If the RPC service is not registered then the statistics
    don't need to be checked because they shouldn't be changing.
    
    The 2 previously added tests added to check statistics progress now
    behave identically and fail on all iterations.  To support testing
    with "timeouts", an optional TIMEOUT flag can now be added to the RPC
    service passed to nfs_iterate_test().  2 new tests are added to
    exercise the new behaviour.
    
    The 2 new "if" statements in nfs_iterate_test() could be combined.
    However, a subsequent commit would split them and would be more
    difficult to read.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit f7a96deafa9f94522b0532357a388042c514ac7c
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Fri Jul 5 11:28:34 2024 +1000

    ctdb-tests: Make _rpc_service_up() and _rpc_services_down() internal
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 0919701a68b8da41261ab2a88797d59f55ff7f2a
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Thu Jul 4 11:10:59 2024 +1000

    ctdb-tests: Make NFS RPC monitoring tests consistent
    
    Update the remaining RPC monitoring tests to use nfs_iterate_test(),
    depending on it to set results.  This makes all RPC monitoring tests
    consistent, so they will all benefit from future improvements.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 47c33a2442c7745956bbe418d3e3944da7c4e12b
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Fri Jul 5 11:01:45 2024 +1000

    ctdb-tests: Drop unnecessarily "else"
    
    Doing this in a previous commit would have made it more difficult to
    read that commit.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 8b2f228198b558325ece2f32516776db6a322282
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Thu Jul 4 15:17:25 2024 +1000

    ctdb-tests: Replace implicit healthy behaviour with early exits
    
    The early exits from the sub-shell make the obvious successes much
    more obvious, and slightly simplify the code that follows.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit a522864138a36c6dba952fcca760fd0c89428a2a
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Fri Jul 5 10:46:30 2024 +1000

    ctdb-tests: Simplify handling of statistics change
    
    Handling this across two different functions led to insanity, so
    simplify.
    
    The handling of unhealthy_after when $_numfails = 0 implicitly causes
    the node to be healthy.  This is how the "rpcinfo succeeds" case
    works.  Doing it this way for statistics makes this patch easier to
    read.  The implicit behaviour will go away in the next patch.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 084a69d5522921051f71338e8f8a6b5b0a95ffe1
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Thu Jul 4 12:04:15 2024 +1000

    ctdb-tests: Move result check to rpc_set_service_failure_response()
    
    The current structure here is wrong and repetitive.  Checking rpcinfo
    result and determining output should be in the same place.
    
    Failure counting is now contained in
    rpc_set_service_failure_response(), but needs a file to survive the
    sub-shell.
    
    Don't attempt to combine and simplify code yet.  That would make this
    commit harder to review.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 47540012009aecbb0df90c158bcfb64afc135913
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Fri Jul 5 11:47:56 2024 +1000

    ctdb-tests: Initialise return code file
    
    The output file is initialised, so doesn't need to be created on
    success.  Treat the return code file the same way.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 833deb067d8d1f579d78a8d25926570c98ab4f47
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Fri Jul 5 09:23:27 2024 +1000

    ctdb-tests: Add function rpc_failure() to log failures and warnings
    
    Improves readability, makes future changes easier.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 1d9661d587f9c2d8241f4f6f5fb58394ecd658e0
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Thu Jul 4 10:44:18 2024 +1000

    ctdb-tests: Argument 3 to nfs_iterate_test() is up iteration
    
    Nothing more complex is ever done, so we might as well simplify and
    reduce coupling.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 7c5e7080015117319162d37eced72f7a01b8d1af
Author: Martin Schwenke <mschwenke at ddn.com>
Date:   Thu Jul 4 10:55:07 2024 +1000

    ctdb-tests: nfs_iterate_test() marks RPC service down
    
    If an RPC service is given, it is automatically marked down.  This
    avoids repetition in test cases and loosens coupling.
    
    Signed-off-by: Martin Schwenke <mschwenke at ddn.com>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/config/events/legacy/60.nfs.script            |  22 +-
 ctdb/config/functions                              |   5 +
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.105.sh |   1 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.106.sh |   2 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.111.sh |   6 +-
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.112.sh |   2 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.113.sh |   2 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.114.sh |   2 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.115.sh |   5 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.116.sh |   5 -
 ...60.nfs.monitor.115.sh => 60.nfs.monitor.117.sh} |   9 +-
 ...60.nfs.monitor.116.sh => 60.nfs.monitor.118.sh} |   9 +-
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.119.sh |  23 ++
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.121.sh |   2 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.122.sh |   5 +-
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.131.sh |   2 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.132.sh |   5 +-
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.141.sh |   2 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.142.sh |   5 +-
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.143.sh |   1 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.144.sh |   2 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.151.sh |   6 +-
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.152.sh |   2 -
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.153.sh |   5 +-
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.171.sh |   9 +
 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.172.sh |   9 +
 ctdb/tests/UNIT/eventscripts/scripts/60.nfs.sh     | 292 +++++++++++++++------
 ctdb/tests/UNIT/eventscripts/stubs/rpcinfo         |  13 +-
 lib/param/loadparm.c                               |   2 +-
 29 files changed, 301 insertions(+), 154 deletions(-)
 copy ctdb/tests/UNIT/eventscripts/{60.nfs.monitor.115.sh => 60.nfs.monitor.117.sh} (66%)
 copy ctdb/tests/UNIT/eventscripts/{60.nfs.monitor.116.sh => 60.nfs.monitor.118.sh} (66%)
 create mode 100755 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.119.sh
 create mode 100755 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.171.sh
 create mode 100755 ctdb/tests/UNIT/eventscripts/60.nfs.monitor.172.sh


Changeset truncated at 500 lines:

diff --git a/ctdb/config/events/legacy/60.nfs.script b/ctdb/config/events/legacy/60.nfs.script
index 246a856bca8..bc5be241f67 100755
--- a/ctdb/config/events/legacy/60.nfs.script
+++ b/ctdb/config/events/legacy/60.nfs.script
@@ -164,14 +164,22 @@ nfs_check_service()
 
 			if [ -f "$_curr" ]; then
 				mv -f "$_curr" "$_prev"
+			else
+				# Make initial stats empty, so a
+				# failed attempt to retrieve them on
+				# service stall is less likely to
+				# result in a false stats change
+				: >"$_prev"
 			fi
-			eval "$service_stats_cmd" >"$_curr" 2>&1
+			eval "$service_stats_cmd" >"$_curr"
 
+			# Only consider statistics on timeout.  This
+			# is done below by checking if this string is
+			# contained in $_err.
+			_t="rpcinfo: RPC: Timed out"
 			if ! $_ok &&
+				[ "${_err#*"${_t}"}" != "$_err" ] &&
 				! cmp "$_prev" "$_curr" >/dev/null 2>&1; then
-				# Stats always implicitly change on
-				# the first monitor event, since
-				# previous stats don't exists...
 				echo "WARNING: statistics changed but ${_err}"
 				_ok=true
 			fi
@@ -185,6 +193,12 @@ nfs_check_service()
 			exit 0
 		fi
 
+		# Don't count immediately after startup
+		if ! ctdb_counter_exists "$_progname"; then
+			echo "ERROR: $_err"
+			exit 1
+		fi
+
 		ctdb_counter_incr "$_progname"
 		_failcount=$(ctdb_counter_get "$_progname")
 
diff --git a/ctdb/config/functions b/ctdb/config/functions
index ef79dbf2162..f8f539ad53f 100755
--- a/ctdb/config/functions
+++ b/ctdb/config/functions
@@ -866,6 +866,11 @@ ctdb_counter_get()
 	# shellcheck disable=SC2086
 	echo $_val
 }
+ctdb_counter_exists()
+{
+	_ctdb_counter_common "$1"
+	[ -e "$_counter_file" ]
+}
 
 #
 # Fail counter/threshold combination to control warnings and node unhealthy
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.105.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.105.sh
index e83ead84e33..79aeac1e36c 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.105.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.105.sh
@@ -6,5 +6,4 @@ define_test "all services available, 10 iterations with ok_null"
 
 setup
 
-ok_null
 nfs_iterate_test 10
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.106.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.106.sh
index 43d6b2f03ec..9cd716baaac 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.106.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.106.sh
@@ -6,6 +6,4 @@ define_test "portmapper down, 2 iterations"
 
 setup
 
-rpc_services_down "portmapper"
-
 nfs_iterate_test 2 "portmapper"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.111.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.111.sh
index 2bbda9686b7..f398b543f15 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.111.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.111.sh
@@ -6,8 +6,4 @@ define_test "knfsd down, 1 iteration"
 
 setup
 
-rpc_services_down "nfs"
-
-ok_null
-
-simple_test
+nfs_iterate_test 1 "nfs"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.112.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.112.sh
index 4000b5d8435..1885c943312 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.112.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.112.sh
@@ -8,6 +8,4 @@ define_test "knfsd down, 10 iterations"
 
 setup
 
-rpc_services_down "nfs"
-
 nfs_iterate_test 10 "nfs"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.113.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.113.sh
index 744966c94c7..764ef30826f 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.113.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.113.sh
@@ -7,8 +7,6 @@ define_test "knfsd down, 10 iterations, no hung threads"
 # knfsd fails and attempts to restart it fail.
 setup
 
-rpc_services_down "nfs"
-
 nfs_setup_fake_threads "nfsd"
 
 nfs_iterate_test 10 "nfs"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.114.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.114.sh
index 7170fff6472..aba348742b6 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.114.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.114.sh
@@ -7,8 +7,6 @@ define_test "knfsd down, 10 iterations, 3 hung threads"
 # knfsd fails and attempts to restart it fail.
 setup
 
-rpc_services_down "nfs"
-
 nfs_setup_fake_threads "nfsd" 1001 1002 1003
 
 nfs_iterate_test 10 "nfs"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.115.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.115.sh
index 860436328d4..7afb906049d 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.115.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.115.sh
@@ -18,9 +18,4 @@ service_debug_cmd="program_stack_traces nfsd 5"
 service_stats_cmd="date --rfc-3339=ns | grep ."
 EOF
 
-# Test flag to indicate that stats are expected to change
-nfs_stats_set_changed "nfs" "status"
-
-rpc_services_down "nfs"
-
 nfs_iterate_test 10 "nfs"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.116.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.116.sh
index 1bdd29e73ec..a2025e93680 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.116.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.116.sh
@@ -18,9 +18,4 @@ service_debug_cmd="program_stack_traces nfsd 5"
 service_stats_cmd="echo 'hello world' | grep ."
 EOF
 
-# Test flag to indicate that stats are expected to change
-nfs_stats_set_changed "status"
-
-rpc_services_down "nfs"
-
 nfs_iterate_test 10 "nfs"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.115.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.117.sh
similarity index 66%
copy from ctdb/tests/UNIT/eventscripts/60.nfs.monitor.115.sh
copy to ctdb/tests/UNIT/eventscripts/60.nfs.monitor.117.sh
index 860436328d4..0bac550c52f 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.115.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.117.sh
@@ -2,7 +2,7 @@
 
 . "${TEST_SCRIPTS_DIR}/unit.sh"
 
-define_test "NFS RPC service down, stats change, 10 iterations"
+define_test "NFS RPC service timeout, stats change, 10 iterations"
 
 setup
 
@@ -18,9 +18,4 @@ service_debug_cmd="program_stack_traces nfsd 5"
 service_stats_cmd="date --rfc-3339=ns | grep ."
 EOF
 
-# Test flag to indicate that stats are expected to change
-nfs_stats_set_changed "nfs" "status"
-
-rpc_services_down "nfs"
-
-nfs_iterate_test 10 "nfs"
+nfs_iterate_test 10 "nfs:TIMEOUT"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.116.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.118.sh
similarity index 66%
copy from ctdb/tests/UNIT/eventscripts/60.nfs.monitor.116.sh
copy to ctdb/tests/UNIT/eventscripts/60.nfs.monitor.118.sh
index 1bdd29e73ec..ef94d7fba9b 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.116.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.118.sh
@@ -2,7 +2,7 @@
 
 . "${TEST_SCRIPTS_DIR}/unit.sh"
 
-define_test "NFS RPC service down, stats don't change, 10 iterations"
+define_test "NFS RPC service timeout, stats don't change, 10 iterations"
 
 setup
 
@@ -18,9 +18,4 @@ service_debug_cmd="program_stack_traces nfsd 5"
 service_stats_cmd="echo 'hello world' | grep ."
 EOF
 
-# Test flag to indicate that stats are expected to change
-nfs_stats_set_changed "status"
-
-rpc_services_down "nfs"
-
-nfs_iterate_test 10 "nfs"
+nfs_iterate_test 10 "nfs:TIMEOUT"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.119.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.119.sh
new file mode 100755
index 00000000000..c291f69b82e
--- /dev/null
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.119.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+. "${TEST_SCRIPTS_DIR}/unit.sh"
+
+define_test "NFS RPC service timeout, silent stats error, 10 iterations"
+
+# It would be nice to have a non-silent stats error... but that's a
+# bit hard for the current test code to handle.  :-(
+
+setup
+
+cat >"${CTDB_BASE}/nfs-checks.d/20.nfs.check" <<EOF
+# nfs
+version="3"
+restart_every=10
+unhealthy_after=2
+service_stop_cmd="\$CTDB_NFS_CALLOUT stop nfs"
+service_start_cmd="\$CTDB_NFS_CALLOUT start nfs"
+service_debug_cmd="program_stack_traces nfsd 5"
+service_stats_cmd="false"
+EOF
+
+nfs_iterate_test 10 "nfs:TIMEOUT"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.121.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.121.sh
index 1cda2765c38..ab336654a58 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.121.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.121.sh
@@ -10,6 +10,4 @@ define_test "lockd down, 7 iterations"
 
 setup
 
-rpc_services_down "nlockmgr"
-
 nfs_iterate_test 7 "nlockmgr"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.122.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.122.sh
index eae7ca0b42a..9999b2b4b28 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.122.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.122.sh
@@ -9,10 +9,7 @@ define_test "lockd down, 7 iterations, back up after 2"
 
 setup
 
-rpc_services_down "nlockmgr"
-
 # Iteration 2 should try to restart rpc.lockd.  However, our test
 # stub rpc.lockd does nothing, so we have to explicitly flag it as up.
 
-nfs_iterate_test 7 "nlockmgr" \
-    3 "rpc_services_up nlockmgr"
+nfs_iterate_test 7 "nlockmgr" 3
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.131.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.131.sh
index 33e1cf47499..174ae9e06c3 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.131.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.131.sh
@@ -6,6 +6,4 @@ define_test "rquotad down, 7 iterations"
 
 setup
 
-rpc_services_down "rquotad"
-
 nfs_iterate_test 7 "rquotad"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.132.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.132.sh
index 207d872c24f..648469390a1 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.132.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.132.sh
@@ -9,7 +9,4 @@ define_test "rquotad down, 7 iterations, back up after 2"
 
 setup
 
-rpc_services_down "rquotad"
-
-nfs_iterate_test 7 "rquotad" \
-    3 'rpc_services_up "rquotad"'
+nfs_iterate_test 7 "rquotad" 3
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.141.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.141.sh
index 5a8c5ce75a7..a436edbb138 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.141.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.141.sh
@@ -8,6 +8,4 @@ define_test "statd down, 7 iterations"
 
 setup
 
-rpc_services_down "status"
-
 nfs_iterate_test 7 "status"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.142.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.142.sh
index 694bf928b7e..d0ed6b527e3 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.142.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.142.sh
@@ -8,7 +8,4 @@ define_test "statd down, 7 iterations, back up after 2"
 
 setup
 
-rpc_services_down "status"
-
-nfs_iterate_test 7 "status" \
-    3 'rpc_services_up "status"'
+nfs_iterate_test 7 "status" 3
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.143.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.143.sh
index d17277ea88d..d2e42fcbaee 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.143.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.143.sh
@@ -8,7 +8,6 @@ define_test "statd down, 2 iterations, stuck process"
 
 setup
 
-rpc_services_down "status"
 nfs_setup_fake_threads "rpc.status" 1001
 
 nfs_iterate_test 2 "status"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.144.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.144.sh
index 5a8c5ce75a7..a436edbb138 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.144.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.144.sh
@@ -8,6 +8,4 @@ define_test "statd down, 7 iterations"
 
 setup
 
-rpc_services_down "status"
-
 nfs_iterate_test 7 "status"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.151.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.151.sh
index 9ab18072d4f..86226220663 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.151.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.151.sh
@@ -6,8 +6,4 @@ define_test "mountd down, 1 iteration"
 
 setup
 
-rpc_services_down "mountd"
-
-ok_null
-
-simple_test
+nfs_iterate_test 1 "mountd"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.152.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.152.sh
index c3a6b8bbf30..c1db405f051 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.152.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.152.sh
@@ -10,6 +10,4 @@ define_test "mountd down, 7 iterations"
 
 setup
 
-rpc_services_down "mountd"
-
 nfs_iterate_test 7 "mountd"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.153.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.153.sh
index a09315bcac5..c840bbe4b60 100755
--- a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.153.sh
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.153.sh
@@ -6,10 +6,7 @@ define_test "mountd down, 7 iterations, back up after 2"
 
 setup
 
-rpc_services_down "mountd"
-
 # Iteration 2 should try to restart rpc.mountd.  However, our test
 # stub rpc.mountd does nothing, so we have to explicitly flag it as
 # up.
-nfs_iterate_test 7 "mountd" \
-    3 "rpc_services_up mountd"
+nfs_iterate_test 7 "mountd" 3
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.171.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.171.sh
new file mode 100755
index 00000000000..71d0f18afb0
--- /dev/null
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.171.sh
@@ -0,0 +1,9 @@
+#!/bin/sh
+
+. "${TEST_SCRIPTS_DIR}/unit.sh"
+
+define_test "nfs down, 1 iteration, not previously healthy"
+
+setup
+
+nfs_iterate_test -i 1 "nfs"
diff --git a/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.172.sh b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.172.sh
new file mode 100755
index 00000000000..81851a265a5
--- /dev/null
+++ b/ctdb/tests/UNIT/eventscripts/60.nfs.monitor.172.sh
@@ -0,0 +1,9 @@
+#!/bin/sh
+
+. "${TEST_SCRIPTS_DIR}/unit.sh"
+
+define_test "nfs down, 10 iterations, not previously healthy"
+
+setup
+
+nfs_iterate_test -i 10 "nfs"
diff --git a/ctdb/tests/UNIT/eventscripts/scripts/60.nfs.sh b/ctdb/tests/UNIT/eventscripts/scripts/60.nfs.sh
index 1a8dab73ded..fa7797c6945 100644
--- a/ctdb/tests/UNIT/eventscripts/scripts/60.nfs.sh
+++ b/ctdb/tests/UNIT/eventscripts/scripts/60.nfs.sh
@@ -22,6 +22,8 @@ EOF
 
 	export RPCNFSDCOUNT
 
+	TEST_RPC_ALL_SERVICES="portmapper nfs mountd rquotad nlockmgr status"
+
 	if [ "$1" != "down" ]; then
 		debug <<EOF
 Setting up NFS environment: all RPC services up, NFS managed by CTDB
@@ -40,9 +42,9 @@ EOF
 			;;
 		esac
 
-		rpc_services_up \
-			"portmapper" "nfs" "mountd" "rquotad" \
-			"nlockmgr" "status"
+		# Intentional word splitting
+		# shellcheck disable=SC2086
+		_rpc_services_up $TEST_RPC_ALL_SERVICES
 
 		nfs_setup_fake_threads "nfsd"
 		nfs_setup_fake_threads "rpc.foobar" # Set the variable to empty
@@ -67,7 +69,7 @@ EOF
 	fi
 }
 
-rpc_services_down()
+_rpc_services_down()
 {
 	_out=""
 	for _s in $FAKE_RPCINFO_SERVICES; do
@@ -82,7 +84,22 @@ rpc_services_down()
 	FAKE_RPCINFO_SERVICES="$_out"
 }
 
-rpc_services_up()
+_rpc_services_timeout()
+{
+	_out=""
+	for _s in $FAKE_RPCINFO_SERVICES; do
+		for _i; do
+			if [ "$_i" = "${_s%%:*}" ]; then
+				debug "Marking RPC service \"${_i}\" as TIMEOUT"
+				_s="${_s}:TIMEOUT"
+			fi
+		done
+		_out="${_out}${_out:+ }${_s}"
+	done
+	FAKE_RPCINFO_SERVICES="$_out"
+}
+
+_rpc_services_up()
 {
 	_out="$FAKE_RPCINFO_SERVICES"
 	for _i; do
@@ -121,29 +138,32 @@ nfs_setup_fake_threads()
 	esac
 }
 
-nfs_stats_set_changed()
-{
-	FAKE_NFS_STATS_CHANGED=" $* "
-}
-
 nfs_stats_check_changed()
 {
 	_rpc_service="$1"
-	_iteration="$2"
+	_cmd="$2"
 
-	_t="$FAKE_NFS_STATS_CHANGED"
-	if [ -z "$_t" ]; then
+	if [ -z "$_cmd" ]; then
+		# No stats command, statistics don't change...
 		return 1
 	fi
-	if [ "${_t#* "${_rpc_service}"}" != "$_t" ]; then
-		return 0
-	fi
-	# Statistics always change on the first iteration
-	if [ "$_iteration" -eq 1 ]; then
-		return 0
+
+	_curr="${CTDB_TEST_TMP_DIR}/${_rpc_service}.stats"


-- 
Samba Shared Repository