Possible bug in ctdb

Fri Feb 26 06:22:56 UTC 2021

On Fri, Feb 26, 2021 at 2:57 PM Martin Schwenke <martin at meltin.net> wrote:
>
> Hi Ralph,
>
> On Thu, 25 Feb 2021 13:05:52 +0100, Ralph Boehme <slow at samba.org> wrote:
>
> > I noticed the following if condition in check_static_boolean_change()
> > (defined twice):
> >
> >      if (mode == CONF_MODE_RELOAD || CONF_MODE_API)
> >
> > This will always evaluate to true, I guess that is not intended and the
> > fix would be
> >
> >      if (mode == CONF_MODE_RELOAD || mode == CONF_MODE_API)
> >
> > (WIP attached, lacking bugnumber).
>
> Yes, obviously a bug.  My bug.  Fix looks sane.
>
> Note that this code is actually a no-op and it just logs a warning.
> CTDB doesn't currently support reloading the configuration at run
> time... but the config system does. When reloading is implemented it
> will flag that after a config reload we don't look at the new value of
> the variable that points to that config value, so there is no change the
> daemon's behaviour even if that config setting is changed.  There are
> just some things that you can't (or don't want to ;-) change at
> run-time.
>
> For consistency I'd almost like to see that condition coded as:
>
>   if (conf_maybe_updating(mode)) {
>
> although perhaps we should just write it as:
>
>   if (mode != CONF_MODE_LOAD) {
>
> since that catches the other cases consistently.
>
> Let's see what Amitay says.  :-)
>

Let's do mode != CONF_MODE_LOAD...

> > I'm currently debugging a ctdb issue where I wondered whether tdb
> > mutexes are actually enabled (I was seeing "tdb_chain*un*lock() took
> > +-10 ms" many times in the logs).
> >
> > "tdb mutexes" defaults to true, so I wonder whether this bug can cause
> > the default to not become effective, still wrapping my hear around the
> > ctdb config code...
>
> Since it is just a warning, that can't happen.  2 simple
> checks to do.
>
> * Check the logs for the attach:
>
>   2021/01/30 17:40:32.096676 ctdbd[1440545]: Attached to database
>   'yip/node.0/db/volatile/foo.tdb.0' with flags 0x1c41
>
>   That's logged at NOTICE level.
>
>   #define TDB_MUTEX_LOCKING 4096 /** ... */  so that's the leading 1.
>
>   I see this consistently for volatile databases.
>
> * Check the logs for the warning "Ignoring update of ..."
>
>   This is just to see if the warning is triggered.
>
>   I don't see it when I try it out, so I started looking at the conf
>   code to try to figure out why we don't see this logged when the
>   config is initially loaded.
>
>   However, the config is initially loaded before logging is setup
>   (standard chicken/egg situation), so warnings/errors from initial
>   configuration load probably go to stderr?  Perhaps systemd has caught
>   them and put them in the journal?  We should do something about
>   that... not sure what.
>
> How's that?
>
> peace & happiness,
> martin

Amitay.