CTDB Segfault in a container-based env - Looking for pointers

Fri Jul 16 06:02:31 UTC 2021

On pe, 16 heinä 2021, Amitay Isaacs via samba-technical wrote:
> On Fri, Jul 16, 2021 at 2:13 AM Michael Adam via samba-technical
> <samba-technical at lists.samba.org> wrote:
> >
> >
> >
> > > On 15. Jul 2021, at 15:16, John Mulligan via samba-technical <samba-technical at lists.samba.org> wrote:
> > >
> > > On Wednesday, July 14, 2021 10:12:46 PM EDT Amitay Isaacs via samba-technical
> > > wrote:
> > >> Hi John,
> > >>
> > >> There are certain assumptions made in CTDB assuming it's running on a
> > >> normal system. When running CTDB in a container, those assumptions are
> > >> not valid any more and you might get unexpected behaviour.
> > >>
> > >
> > > First, thanks for replying!
> > >
> > > Sure, I fully expect that. It was similar for smbd/winbind but in those cases
> > > I was able to tune the environment sufficiently - for example they need to run
> > > within the same pid namespace to function properly. The issue I'm having now
> > > is that the segfault isn't mapping to anything obvious (yet) that I can change
> > > in the environment.
> > >
> > >> One such assumption is that init (in some form) has the pid of 1 and
> > >> CTDB daemon will never have PID of 1.  Obviously this is not true in
> > >> your case.  From the logs you can see that the CTDB daemon is started
> > >> as PID 1. In general, CTDB relies on init (in some form) to start/stop
> > >> various file services (smb, nfs, etc.) via the event scripts.  So,
> > >> working init is a requirement for normal operation of CTDB.
> > >
> > > Good point. I'll experiment with giving ctdb a parent process.
> >
> > Right, if we want to avoid systemd or other beefier systems that are not made for containers, we can consider “tini”: E.g. rook is using this.
> >
> > >> What are you trying to do exactly?  You cannot put CTDB in a container
> > >> on its own without Samba daemons.
> >
> > Hmm, at least last I checked you can even run ctdb in a
> > “traditional” non-containerized cluster without any samba daemons.
> > :-)
> 
> Of course you can.  But that doesn't serve any useful purpose. :-)
> 
> > Maybe you are saying that if you want to run smbd/winbindd on top of
> > ctdb, then they must run in the same container? I don’t think this
> > is true either:
> >
> > We usually have multiple containers in one pod, and the containers
> > within the pod can communicate just as normal. At least that’s what
> > we did with the smbd and windbindd daemons: separate containers in
> > one pod.
> >
> 
> My understanding of containers is limited here, so I don't understand
> how you can run ctdb and smbd in different containers.  Does mutex
> locking on shared databases work across containers (or different
> namespaces)?  How about unix datagram messaging using pids?
> If mutex locking on shared databases works across containers, then
> obviously you can run ctdb and smbd in different containers.
> If unix datagram messaging works across containers, then obviously you
> can run smbd and winbindd in different containers.

Container is a collection of namespaces on top of the same Linux kernel.
Two containers may share some namespaces but not the others. For a
collection of containers on the same physical system it is possible to
define shared properties if they run in the same 'pod'. So you can have
UNIX domain sockets shared across different containers on the same host.
It needs a good coordination, of course.

> 
> > > I'm not clear on what you mean by that. My longer term goal is to investigate
> > > CTDB as part of the HA story for samba in containers (see our general effort
> > > here [1]). Short term, I just want to run ctdb on its own with very few (or
> > > none) event scripts just to get tdb replication working across multiple nodes
> > > in a container based environment. Based on my reading of the docs and a tiny
> > > bit of the code, bringing up smbd/etc is the responsibility of the event
> > > scripts
> >
> > This is not quite true:
> >
> > Ctdb logically consists of two layers:
> >
> > (1) the mandatory core is the distributed tab database and messaging channel for smbd
> > (2) the optional upper layer is about resource management (public IPs, services like smbd, winbindd, etc)
> >
> > Ctdb and samba can run together perfectly without #2 as long as
> > someone takes care of the service management. E.g it has been done
> > with pacemaker. In our case, Kubernetes / operators, etc, would
> > provide this role and we would run ctdb without
> > “CTDB_MANAGES_SAMBA=yes” etc...
> >
> > > so I think it should be possible to run ctdb on its own like that.
> > >
> > > Any thoughts on adding code to specifically handle the case where the callback
> > > has already been called, but tevent calls it again?
> >
> > Right the crux here seems to be the question whether the
> > tevent-using code in ctdb is not prepared for the situation that
> > EPOLLHUP is issued, and if  it would be appropriate to just catch
> > that condition (of being called again).
> 
> Well that's not really the crux here.  I know what the real issue is
> (I did write that code), but I still don't understand the motivation
> behind running ctdb and smbd in different containers.

I would say it is an independent question. Containerized environments
have been developed with the focus on microservices, where most
container instances only run a handful of processes; overall
architecture is to decompose your solution into a set of independently
running 'microservices' that can be spawned according to the scale
needs.

This does not always fly with domain controller-like setups where you
cannot have a microservices architecture in a natural way. However, for
serving files, for example, it would scale well if each individual
'microserviced' Samba file share represented with a single container
instance serving a single consumer.

You absolutely can run a 'heavy' solution in a single container with
traditional init system in it. We do so for FreeIPA in a container,
which relies on systemd services in more areas than Samba does (socket
activation, resource protection, ...). In most situations a request to
run in a microservice-like way is rather an organizational than a
technical requirement.

-- 
/ Alexander Bokovoy