CTDB: simple/90_debug_hung_script.sh test fails on some machines

Martin Schwenke martin at meltin.net
Wed Nov 14 05:01:40 UTC 2018


On Wed, 14 Nov 2018 15:04:26 +1100, Martin Schwenke via samba-technical
<samba-technical at lists.samba.org> wrote:

> On Wed, 14 Nov 2018 14:45:08 +1300, Andrew Bartlett
> <abartlet at samba.org> wrote:
> 
> > On Wed, 2018-11-14 at 14:24 +1300, Tim Beale via samba-technical wrote:  
> 
> > > I noticed a problem trying to run the samba_build_ctdb test on the
> > > rackspace machines. The tests/simple/90_debug_hung_script.sh test-case
> > > seems to reliably fail.
> > > 
> > > I could reproduce the failure by running the ctdb autobuild on my PC.
> > > Basically the test is failing because the 'cat "/proc/${pid}/stack"' in
> > > debug-hung-script.sh fails (Operation not permitted). The reason for the
> > > failure seems to be the Yama ptrace_scope setting on the host machine.
> > > https://www.kernel.org/doc/Documentation/security/Yama.txt
> > > 
> > > My PC had kernel.yama.ptrace_scope set to 1. If I set it to zero, then
> > > the CTDB test passes. It seems like the gitlab CI machines must use a
> > > ptrace_scope=0 setting.    
> 
> > Specifically, while the 'shared' runners must have a more liberal
> > setting (as they pass), runners started by the Samba Team at Rackspace
> > use a different kernel-side configuration.  
> > 
> > We are testing out using Rackspace runners for the whole CI to ensure
> > we don't strictly rely on the shared resources (which are free - to us
> > - small VMs only available because we are on gitlab.com).
> > 
> > While we are also looking to change that the ptrace limitations on
> > 'our' machines, in the meantime it would be nice if the test was a
> > little more accepting.   
> 
> I've attached a patch that attempts to work around this nonsense.  It
> seems to work, but I think we need to do some more thinking before
> pushing this fragile pile of toothpicks and snot.  :-D
> 
> We keep coming unstuck in this genre of tests, which attempt to check
> whether our hung event script debugging works properly.   We've seen a
> bug in pstree (that *usually* only manifests on very loaded test
> systems) and now this, which is basically arbitrary "breakage by design"
> in /proc/.
> 
> We might need a new approach.  I have some ideas... none of them
> good...  yet...  :-(
> 
> Give me a little while... I'll also see if Amitay has any good idea...  :-)

Let's not waste too much time on this.  Let's get something in that
fixes the problem (for now).

Please review and maybe push the patch I previously posted.

If this stops working (due to some LSM becoming more selective about
what it blocks, or similar) then we'll move the test to the complex/
test suite.  In that case it will only ever be run as root and should
survive for a while longer.  If problems persist then we'll drop the
test.

Thanks...

peace & happiness,
martin



More information about the samba-technical mailing list