CTDB Segfault in a container-based env - Looking for pointers
Martin Schwenke
martin at meltin.net
Wed Aug 4 04:18:37 UTC 2021
On Tue, 20 Jul 2021 15:59:02 +1000, Amitay Isaacs via samba-technical
<samba-technical at lists.samba.org> wrote:
> On Fri, Jul 16, 2021 at 5:47 PM Michael Adam <obnox at samba.org> wrote:
> The issue is that CTDB makes assumptions about the orphan processes.
> On most unix systems, an orphan process gets re-parented to init which
> traditionally has pid = 1. This assumption is built into the code to
> avoid runaway orhan processes in CTDB.
Yes, we explicitly check if the parent process is 1 in the lock helper
before continuing. As discussed offline, we should try something with a
file descriptor event to try to determine whether the parent has gone
away.
> In the container world, what happens to orphan processes?
Everything I can find says they are re-parented to process 1 in the
container.
> > Even if you don’t see a real benefit of this containerized layout
> > just yet, it might still be beneficial for the code to consider
> > some modifications to make ctdb more “container-ready”...
> Provided it makes sense. ;-)
Yep! If there is no sane re-parenting of orphan processes inside
containers then we should recommended that CTDB is always run via a
minimal init. CTDB launches a lot of processes and if it goes away
then something needs to look after them.
As we discussed offline, at the moment the current crash reminds us we
have a problem to solve, so we shouldn't just "fix" it to avoid the
crash. We should find a better solution for detecting that the parent
has gone away, use that and then fix the crash that may still occur.
We might also be doing a similar thing elsewhere...
peace & happiness,
martin
More information about the samba-technical
mailing list