CTDB Segfault in a container-based env - Looking for pointers

Martin Schwenke martin at meltin.net
Wed Aug 4 04:18:37 UTC 2021


On Tue, 20 Jul 2021 15:59:02 +1000, Amitay Isaacs via samba-technical
<samba-technical at lists.samba.org> wrote:

> On Fri, Jul 16, 2021 at 5:47 PM Michael Adam <obnox at samba.org> wrote:

> The issue is that CTDB makes assumptions about the orphan processes.
> On most unix systems, an orphan process gets re-parented to init which
> traditionally has pid = 1.  This assumption is built into the code to
> avoid runaway orhan processes in CTDB.

Yes, we explicitly check if the parent process is 1 in the lock helper
before continuing.  As discussed offline, we should try something with a
file descriptor event to try to determine whether the parent has gone
away.

> In the container world, what happens to orphan processes?

Everything I can find says they are re-parented to process 1 in the
container.

> > Even if you don’t see a real benefit of this containerized layout
> > just yet, it might still be beneficial for the code to consider
> > some modifications to make ctdb more “container-ready”...  

> Provided it makes sense. ;-)

Yep!  If there is no sane re-parenting of orphan processes inside
containers then we should recommended that CTDB is always run via a
minimal init.  CTDB launches a lot of processes and if it goes away
then something needs to look after them.

As we discussed offline, at the moment the current crash reminds us we
have a problem to solve, so we shouldn't just "fix" it to avoid the
crash.  We should find a better solution for detecting that the parent
has gone away, use that and then fix the crash that may still occur.
We might also be doing a similar thing elsewhere...

peace & happiness,
martin



More information about the samba-technical mailing list