[PATCH] Fix bug #13121 - Non-smbd processes using kernel oplocks can hang smbd

jim jim.brown at rsmas.miami.edu
Wed Nov 29 23:07:16 UTC 2017


Are you certain smbd is generating a lease break to the kernel?
Does this lease break appear in the logs?
Could there be a defect in the smbd lease break such that it won't make 
it to the non-smbd process?

On 11/29/2017 6:01 PM, Jeremy Allison via samba-technical wrote:
> On Wed, Nov 29, 2017 at 11:53:31PM +0100, Ralph Böhme wrote:
>> On Wed, Nov 29, 2017 at 02:13:34PM -0800, Jeremy Allison via samba-technical wrote:
>>> On Thu, Nov 30, 2017 at 11:05:39AM +1300, Andrew Bartlett wrote:
>>>> On Thu, 2017-11-30 at 06:23 +1300, Andrew Bartlett wrote:
>>>>> On Wed, 2017-11-29 at 08:43 -0800, Jeremy Allison wrote:
>>>>>> Thanks for persevering with this. I'm OK with you
>>>>>> marking it flakey now you can reproduce locally.
>>>>> Good.
>>>> I've done the fixes required for the test, and I'll push it shortly.
>>>>
>>>> This is a 'real' flapping test, it also flaps on sn-devel if you run
>>>> the loop for long enough.
>>> Thanks a lot ! I'm very puzzled by the error 10 though - it
>>> means a missing RT signal. I'll try and get some time to
>>> investigate with a standalone program.
>> if you need an additional pair of eyes, let me know. I've been carefully going
>> through the test looking for race conditions causing signal loss or similar, no
>> luck so far, test seems correct. I was specifically worried about the while loop
>> around tevent_loop_once, but with tevent there shouldn't be a race condition
>> between signal delivery and waiting for signal. *scratches head*
> Yeah, I simply can't see a place the signal loss can
> occur unless it's the kernel dropping the ball.
>
> Note that the signal loss occurs in the non-smbd/non-samba
> client test code (thats the forked child from smbtorture
> that opens the test file, gets the lease, and then waits
> for the kernel to signal a lease break from the smbd).
>
> That child is returning with an exit code of 10, meaning
> the alarm(5) fired when we were in the pause() call instead
> of getting the RT_SIGNAL_LEASE signal.
>
> This is the *simple* part of the test - all it does
> is get a lease and wait for a signal. The complex part we
> know works, that's the client driving the smbd to open the same
> file via the test share.
>




More information about the samba-technical mailing list