[PATCH] ctdb-common: Set close-on-exec when creating PID file (bug 12898)
Martin Schwenke
martin at meltin.net
Fri Jul 28 06:43:33 UTC 2017
On Thu, 27 Jul 2017 15:13:45 +0200, Volker Lendecke
<Volker.Lendecke at SerNet.DE> wrote:
> On Thu, Jul 13, 2017 at 08:37:57PM +1000, Martin Schwenke wrote:
> > * API
> >
> > We really just want a file pathname instead of directory and basename.
> > The main reason for this is that initscripts and systemd unit files
> > (and. consequently, ctdbd_wrapper) need to know the full pathname, so
> > it doesn't make sense to split it and re-join it.
>
> Fine with that. All callers of lib/util/pidfile.c use
> lp_pid_directory() or the source4 equivalent. That is easily
> wrappable.
Agreed. :-)
> > * Coverity hit
> >
> > lib/util/pidfile.c currently leaks the file descriptor and Coverity
> > notices. Amitay's implementation stores it in a context. When the
> > context is freed then the destructor does unlink/unlock/close.
>
> To be honest, I see this as a cosmetic issue. The pidfile is kept open
> by the parent process deliberately. It's a bit like the
> autofree_context discussion: Depending on proper rundown by a talloc
> destructor is important for more dynamic things. We should not depend
> on this for main process exit. Other opinions? I'd call this a
> "deliberate" in Coverity.
I don't think it matters. If the process exits without calling the
destructor then the lock is released, so there's a harmless stale
pidfile. However, as a mechanism for ensuring the file is unlinked in
most circumstances, the talloc destructor is "nice to have" - the
pidfile gets cleaned up without having to keep the filename around.
> > * Fear ;-)
> >
> > lib/util/pidfile.c has this at the top:
> >
> > /* this code is broken - there is a race condition with the unlink
> > (tridge) */
>
> Not sure this still is true. If the ctdb version does this better, I'm
> more than happy to port the fix.
Yeah, I've spent some time looking and there is still a problem...
The problem is that pidfile_pid() unlinks the file if the PID doesn't
exist. Say there are 2 calls to pidfile_pid() in flight with at least
one of them coming from pidfile_create(). Both calls find the PID
doesn't exist and branch to noproc: to do the unlink. The one called
from pidfile_create() returns first, creates the PID file and the second
caller to pidfile_pid() unlinks it.
I don't think pidfile_pid() has any business unlinking the pidfile. It
should be a read-only operation.
The only time the pidfile should be *replaced* is if pidfile_create()
finds that it is stale because it can take the lock, in which case it
overwrites it. That means dropping O_EXCL from the open() in
pidfile_create(). There's no need to call pidfile_pid() in
pidfile_create() because that itself is racy and doesn't tell you
anything. In that case you basically end up with the core of Amitay's
code in the CTDB version. :-)
If you (and perhaps others) agree with this analysis then I'm happy to
do the work to incorporate the core of the CTDB version of
pidfile_create() into a low-level function in lib/util/pidfile.* and
wrap both existing interfaces around that... along with the
simplification to pidfile_pid() to stop it unlinking supposedly stale
pidfiles. I can do this on Monday...
peace & happiness,
martin
More information about the samba-technical
mailing list