[PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

Jeremy Allison jra at samba.org
Wed Nov 29 16:43:59 UTC 2017


On Wed, Nov 29, 2017 at 05:37:57PM +1300, Andrew Bartlett wrote:
> On Wed, 2017-11-29 at 16:32 +1300, Andrew Bartlett wrote:
> > On Tue, 2017-11-28 at 19:21 -0800, Jeremy Allison wrote:
> > > 
> > > I can take a look, but right now everything is pointing to
> > > a problem in your cloud environment. Gary, you said you'd
> > > been able to reproduce it - was that on a local build ?
> > 
> > Gary thought he did reproduce it under load (while working), but
> > naturally like all load-induced things it went away again.
> 
> I've reproduced it locally with the attached patch.  I'm on Debian 9.2
> Stable
> 
> Linux ruth 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64
> GNU/Linux
> 
> It failed after an hour with:
> 
> [1040(8312)/9999 at 1h7m49s] samba3.smb2.kernel-oplocks(nt4_dc)
> Bad child exit code 10
> UNEXPECTED(error): samba3.smb2.kernel-oplocks.kernel_oplocks8(nt4_dc)
> REASON: Exception: Exception: Unknown error/failure. Missing torture_fail() or torture_assert_*() call?
> command: /data/samba/git/samba6/bin/smbtorture  $LOADLIST --configfile=$SMB_CONF_PATH --option='fss:sequence timeout=1' --maximum-runtime=$SELFTEST_MAXTIME --basedir=$SELFTEST_TMPDIR --format=subunit --option=torture:progress=no --option=torture:sharedelay=100000 --option=torture:writetimeupdatedelay=500000 --target=samba3 //$SERVER/kernel_oplocks -U$USERNAME%$PASSWORD --option=torture:localdir=$SELFTEST_PREFIX/nt4_dc/share smb2.kernel-oplocks 2>&1  | /data/samba/git/samba6/selftest/filter-subunit --fail-on-empty --prefix="samba3.smb2.kernel-oplocks." --suffix="(nt4_dc)"
> expanded command: /data/samba/git/samba6/bin/smbtorture  $LOADLIST --configfile=/data/samba/git/samba6/st/client/client.conf --option='fss:sequence timeout=1' --maximum-runtime=1200 --basedir=/data/samba/git/samba6/st/tmp --format=subunit --option=torture:progress=no --option=torture:sharedelay=100000 --option=torture:writetimeupdatedelay=500000 --target=samba3 //LOCALNT4DC2/kernel_oplocks -Uabartlet%localntdc2pass --option=torture:localdir=/data/samba/git/samba6/st/nt4_dc/share smb2.kernel-oplocks 2>&1  | /data/samba/git/samba6/selftest/filter-subunit --fail-on-empty --prefix="samba3.smb2.kernel-oplocks." --suffix="(nt4_dc)"
> ERROR: Testsuite[samba3.smb2.kernel-oplocks(nt4_dc)]
> REASON: Exit code was 1
> 
>  errors[1]
> 
> [1041(8320)/9999 at 1h7m57s, 1 errors] samba3.smb2.kernel-oplocks(nt4_dc)
> [1042(8328)/9999 at 1h8m6s, 1 errors] samba3.smb2.kernel-oplocks(nt4_dc)
> Bad child exit code 10
> UNEXPECTED(error): samba3.smb2.kernel-oplocks.kernel_oplocks8(nt4_dc)
> REASON: Exception: Exception: Unknown error/failure. Missing torture_fail() or torture_assert_*() call?
> command: /data/samba/git/samba6/bin/smbtorture  $LOADLIST --configfile=$SMB_CONF_PATH --option='fss:sequence timeout=1' --maximum-runtime=$SELFTEST_MAXTIME --basedir=$SELFTEST_TMPDIR --format=subunit --option=torture:progress=no --option=torture:sharedelay=100000 --option=torture:writetimeupdatedelay=500000 --target=samba3 //$SERVER/kernel_oplocks -U$USERNAME%$PASSWORD --option=torture:localdir=$SELFTEST_PREFIX/nt4_dc/share smb2.kernel-oplocks 2>&1  | /data/samba/git/samba6/selftest/filter-subunit --fail-on-empty --prefix="samba3.smb2.kernel-oplocks." --suffix="(nt4_dc)"
> expanded command: /data/samba/git/samba6/bin/smbtorture  $LOADLIST --configfile=/data/samba/git/samba6/st/client/client.conf --option='fss:sequence timeout=1' --maximum-runtime=1200 --basedir=/data/samba/git/samba6/st/tmp --format=subunit --option=torture:progress=no --option=torture:sharedelay=100000 --option=torture:writetimeupdatedelay=500000 --target=samba3 //LOCALNT4DC2/kernel_oplocks -Uabartlet%localntdc2pass --option=torture:localdir=/data/samba/git/samba6/st/nt4_dc/share smb2.kernel-oplocks 2>&1  | /data/samba/git/samba6/selftest/filter-subunit --fail-on-empty --prefix="samba3.smb2.kernel-oplocks." --suffix="(nt4_dc)"
> ERROR: Testsuite[samba3.smb2.kernel-oplocks(nt4_dc)]
> REASON: Exit code was 1
> 
>  errors[1]

Thanks for persevering with this. I'm OK with you
marking it flakey now you can reproduce locally.

I'm planning to write a non-Samba standalone
test program to try and reproduce the problem
on your cloud VM's which seem to see the problem
much more reliably.

It still looks like a kernel/missing signal bug to me, but a
standalone program should be able to distinguish
this.

In the meantime we still need to run the test
as it's the first test we've had that tests the
interaction between smbd / non-smbd kernel oplock
users.

Cheers,

	Jeremy.

> -- 
> Andrew Bartlett
> https://samba.org/~abartlet/
> Authentication Developer, Samba Team         https://samba.org
> Samba Development and Support, Catalyst IT   
> https://catalyst.net.nz/services/samba
> 
> 
> 

> From 58b8aa7053b8a56c68ac51d72ca567575fa3d302 Mon Sep 17 00:00:00 2001
> From: Andrew Bartlett <abartlet at samba.org>
> Date: Wed, 29 Nov 2017 15:26:04 +1300
> Subject: [PATCH] HACK: put oplock test in a loop
> 
> Signed-off-by: Andrew Bartlett <abartlet at samba.org>
> ---
>  source3/selftest/tests.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/source3/selftest/tests.py b/source3/selftest/tests.py
> index 3e5cffdfc9b..ca0468d85af 100755
> --- a/source3/selftest/tests.py
> +++ b/source3/selftest/tests.py
> @@ -507,7 +507,7 @@ for t in tests:
>      elif t == "smb2.dosmode":
>          plansmbtorture4testsuite(t, "simpleserver", '//$SERVER/dosmode -U$USERNAME%$PASSWORD')
>      elif t == "smb2.kernel-oplocks":
> -        if have_linux_kernel_oplocks:
> +        for x in range(1, 10000):
>              plansmbtorture4testsuite(t, "nt4_dc", '//$SERVER/kernel_oplocks -U$USERNAME%$PASSWORD --option=torture:localdir=$SELFTEST_PREFIX/nt4_dc/share')
>      elif t == "smb2.notify-inotify":
>          if have_inotify:
> -- 
> 2.11.0
> 




More information about the samba-technical mailing list