[PATCHES] CTDB - fix PID file problem

Martin Schwenke martin at meltin.net
Wed Jul 2 21:22:54 MDT 2014


On Wed, 2 Jul 2014 14:58:06 +0200, Sumit Bose <sbose at redhat.com> wrote:

> On Wed, Jul 02, 2014 at 09:01:37PM +1000, Martin Schwenke wrote:
> > On Wed, 2 Jul 2014 12:36:33 +0200, Sumit Bose <sbose at redhat.com> wrote:
> > 
> > > On Wed, Jul 02, 2014 at 07:56:43PM +1000, Martin Schwenke wrote:
> > > > 2 patches:
> > > > 
> > > > * ctdb-daemon: Check PID in ctdb_remove_pidfile(), not unreliable flag
> > > > 
> > > >   If something unexpectedly uses fork() then an exiting child will
> > > >   remove the PID file while the main daemon is still running.  The real
> > > >   test is whether the current process has the PID of the main CTDB
> > > >   daemon.
> > > > 
> > > > * ctdb-daemon: Remove ctdb->ctdbd_pid, just use ctdbd_pid
> > > 
> > > Wouldn't it be nicer to remove the global variable?
> > > 
> > > > 
> > > >   The global variable is needed anyway so just use it.
> > > 
> > > Why? I have found 4 places where ctdbd_pid is used. One is the
> > > assignment, at two other place the ctdb_context is already available.
> > > The remaining one is in ctdb_tevent_trace() where the ctdb_context must
> > > be given as private data in tevent_set_trace_callback() to make
> > > ctdb->ctdbd_pid available.
> > 
> > ... and an extra use is introduced in the 1st patch.  We need a
> > reliable way of checking whether the main ctdbd is exiting so that we
> > can remove the PID file.  The current method is unreliable (e.g. if a
> > library call unexpectedly forks a child that then exits then the PID
> > file is mistakenly removed). So, we need to compare against the main
> > ctdbd PID.  We remove the PID file in an atexit() handler, which needs a
> > global variable.
> > 
> > If there's a better way of doing the 1st patch then I'm happy to drop
> > the 2nd patch.  :-)
> 
> Well, you can read the PID from the PID file :-) A more elegant solution
> might be getpgrp() which returns the process group ID aka the PID of the
> main ctdb daemon. So, getpgrp() == getpid() should be only true for the
> main ctdb daemon as long as you do not do weird stuff with setsid(),
> setpgid() and friends in the child processes.

Sold!

New patches attached.  I've used session ID instead of progress group
because we're more likely to do weird stuff with the latter.  There's
an extra patch to make sure that the call to setsid() succeeds at
startup, since that's not really optional.

Thanks for pointing out a better way!  :-)

peace & happiness,
martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ctdb.patches
Type: application/octet-stream
Size: 6202 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20140703/4f7ca35a/attachment.obj>


More information about the samba-technical mailing list