[PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Jeremy Allison jra at samba.org
Thu Nov 30 20:29:46 UTC 2017


On Wed, Nov 29, 2017 at 10:10:34PM -0800, Jeremy Allison via samba-technical wrote:
> On Wed, Nov 29, 2017 at 03:16:35PM -0800, Jeremy Allison via samba-technical wrote:
> > > 
> > > One idea: can we run the test as root, enable corefile generation on the sytem
> > > and add a killall KILL smbd to the child when the alarm fires because no
> > > RT_SIGNAL_LEASE was generated? We could then gdb the smbd session process
> > > corefile and check whether it was stuck somewhere unexpectedly.
> > 
> > Actually the other poster gave me an idea. We should be able
> > to add an extra debug message from the smbd when it gets -1,EWOULDBLOCK
> > on the open - and make sure that message gets logged.
> 
> I have an idea as to what might be causing this
> problem. It suddenly came to me during dinner. Monica
> asked me what the stupid look was on my face
> and I had to say it was a possible "eurika"
> moment :-).
> 
> Let me code up the idea, and test it on Andrew's
> cloud servers that reproduce the issue 50% of the
> time to see if it makes the test reliable.

OK, there *is* a race condition in the test. It's
the following. The child process gets the kernel
lease and then notifies the parent process to continue
by writing a byte up a pipe. It then sets the alarm
and calls pause() to wait for the parent process to
contact the smbd and get it to trigger the break request
using an open call.

It is possible for the parent to run and trigger the break
request after the child has written to the pipe, but *before*
the child calls pause(). We then miss the signal notifying
the child to break the lease.

The fix for this is below. We need to set the sigmask
on the child to prohibit receiving RT_SIGNAL_LEASE and SIGALRM,
then wait for the lease break or alarm signal by calling
sigsuspend(), not pause().

When I add this patch the test becomes 100% reliable
(not flakey) on the Catalyst cloud.

Thanks a *LOT* Andrew for setting this up and allowing
access - I don't think I would have found this without
it !

Please review and push if happy.

Cheers,

	Jeremy.
-------------- next part --------------
From 296a2f4d0d7719f53014908fcd620f218df3a2c1 Mon Sep 17 00:00:00 2001
From: Jeremy Allison <jra at samba.org>
Date: Thu, 30 Nov 2017 12:25:02 -0800
Subject: [PATCH 1/2] s4: torture: Fix race condition in
 test_smb2_kernel_oplocks8.

The child process gets the kernel lease and then notifies
the parent process to continue by writing a byte up a pipe.
It then sets the alarm and calls pause() to wait for the
parent process to contact the smbd and get it to trigger
the break request using an open call.

It is possible for the parent to run and trigger the break
request after the child has written to the pipe, but *before*
the child calls pause(). We then miss the signal notifying
the child to break the lease.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13121

Signed-off-by: Jeremy Allison <jra at samba.org>
---
 source4/torture/smb2/oplock.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/source4/torture/smb2/oplock.c b/source4/torture/smb2/oplock.c
index 6d749f92c1c..87d7d2ff946 100644
--- a/source4/torture/smb2/oplock.c
+++ b/source4/torture/smb2/oplock.c
@@ -4838,6 +4838,18 @@ static int do_child_process(int pipefd, const char *name)
 	int fd = -1;
 	char c = 0;
 	struct sigaction act;
+	sigset_t set;
+	sigset_t empty_set;
+
+	/* Block RT_SIGNAL_LEASE and SIGALRM. */
+	sigemptyset(&set);
+	sigemptyset(&empty_set);
+	sigaddset(&set, RT_SIGNAL_LEASE);
+	sigaddset(&set, SIGALRM);
+	ret = sigprocmask(SIG_SETMASK, &set, NULL);
+	if (ret == -1) {
+		return 11;
+	}
 
 	/* Set up a signal handler for RT_SIGNAL_LEASE. */
 	ZERO_STRUCT(act);
@@ -4878,8 +4890,8 @@ static int do_child_process(int pipefd, const char *name)
 	/* Ensure the pause doesn't hang forever. */
 	alarm(5);
 
-	/* Wait for RT_SIGNAL_LEASE. */
-	ret = pause();
+	/* Wait for RT_SIGNAL_LEASE or SIGALRM. */
+	ret = sigsuspend(&empty_set);
 	if (ret != -1 || errno != EINTR) {
 		return 6;
 	}
-- 
2.15.0.531.g2ccb3012c9-goog


From 59e63833a6f3bf758b9f1c4446102d5a1af8b3ce Mon Sep 17 00:00:00 2001
From: Jeremy Allison <jra at samba.org>
Date: Thu, 30 Nov 2017 12:28:03 -0800
Subject: [PATCH 2/2] Revert "selftest: mark samba3.smb2.kernel-oplocks as
 flapping"

The correct fix is in the previous commit, to block the
RT_SIGNAL_LEASE until we're ready to receive it.

This reverts commit 27bb8814a526adbd22452ce58754d18e1b00d426.

Signed-off-by: Jeremy Allison <jra at samba.org>
---
 selftest/flapping.d/kernel-oplocks | 4 ----
 1 file changed, 4 deletions(-)
 delete mode 100644 selftest/flapping.d/kernel-oplocks

diff --git a/selftest/flapping.d/kernel-oplocks b/selftest/flapping.d/kernel-oplocks
deleted file mode 100644
index 2102e2bf363..00000000000
--- a/selftest/flapping.d/kernel-oplocks
+++ /dev/null
@@ -1,4 +0,0 @@
-# The smb2.kernel-oplocks tests fails often on Ubuntu 14.04 on the
-# Catalyst Cloud but failures have been seen on normal workstations
-# and sn-devel.
-^samba3.smb2.kernel-oplocks.kernel_oplocks8
-- 
2.15.0.531.g2ccb3012c9-goog



More information about the samba-technical mailing list