CTDB: simple/90_debug_hung_script.sh test fails on some machines

Tim Beale timbeale at catalyst.net.nz
Thu Nov 15 04:44:17 UTC 2018

I've reviewed it and pushed it to autobuild. I also raised bug 13684 so
we can backport the change (for running CI on maintenance branches).


On 14/11/18 6:01 PM, Martin Schwenke wrote:
> On Wed, 14 Nov 2018 15:04:26 +1100, Martin Schwenke via samba-technical
> <samba-technical at lists.samba.org> wrote:
>> On Wed, 14 Nov 2018 14:45:08 +1300, Andrew Bartlett
>> <abartlet at samba.org> wrote:
>>> On Wed, 2018-11-14 at 14:24 +1300, Tim Beale via samba-technical wrote:  
>>>> I noticed a problem trying to run the samba_build_ctdb test on the
>>>> rackspace machines. The tests/simple/90_debug_hung_script.sh test-case
>>>> seems to reliably fail.
>>>> I could reproduce the failure by running the ctdb autobuild on my PC.
>>>> Basically the test is failing because the 'cat "/proc/${pid}/stack"' in
>>>> debug-hung-script.sh fails (Operation not permitted). The reason for the
>>>> failure seems to be the Yama ptrace_scope setting on the host machine.
>>>> https://www.kernel.org/doc/Documentation/security/Yama.txt
>>>> My PC had kernel.yama.ptrace_scope set to 1. If I set it to zero, then
>>>> the CTDB test passes. It seems like the gitlab CI machines must use a
>>>> ptrace_scope=0 setting.    
>>> Specifically, while the 'shared' runners must have a more liberal
>>> setting (as they pass), runners started by the Samba Team at Rackspace
>>> use a different kernel-side configuration.  
>>> We are testing out using Rackspace runners for the whole CI to ensure
>>> we don't strictly rely on the shared resources (which are free - to us
>>> - small VMs only available because we are on gitlab.com).
>>> While we are also looking to change that the ptrace limitations on
>>> 'our' machines, in the meantime it would be nice if the test was a
>>> little more accepting.   
>> I've attached a patch that attempts to work around this nonsense.  It
>> seems to work, but I think we need to do some more thinking before
>> pushing this fragile pile of toothpicks and snot.  :-D
>> We keep coming unstuck in this genre of tests, which attempt to check
>> whether our hung event script debugging works properly.   We've seen a
>> bug in pstree (that *usually* only manifests on very loaded test
>> systems) and now this, which is basically arbitrary "breakage by design"
>> in /proc/.
>> We might need a new approach.  I have some ideas... none of them
>> good...  yet...  :-(
>> Give me a little while... I'll also see if Amitay has any good idea...  :-)
> Let's not waste too much time on this.  Let's get something in that
> fixes the problem (for now).
> Please review and maybe push the patch I previously posted.
> If this stops working (due to some LSM becoming more selective about
> what it blocks, or similar) then we'll move the test to the complex/
> test suite.  In that case it will only ever be run as root and should
> survive for a while longer.  If problems persist then we'll drop the
> test.
> Thanks...
> peace & happiness,
> martin

More information about the samba-technical mailing list