Possible bug in ctdb

Fri Feb 26 03:57:36 UTC 2021

Hi Ralph,

On Thu, 25 Feb 2021 13:05:52 +0100, Ralph Boehme <slow at samba.org> wrote:

> I noticed the following if condition in check_static_boolean_change() 
> (defined twice):
> 
>      if (mode == CONF_MODE_RELOAD || CONF_MODE_API)
> 
> This will always evaluate to true, I guess that is not intended and the 
> fix would be
> 
>      if (mode == CONF_MODE_RELOAD || mode == CONF_MODE_API)
> 
> (WIP attached, lacking bugnumber).

Yes, obviously a bug.  My bug.  Fix looks sane.

Note that this code is actually a no-op and it just logs a warning.
CTDB doesn't currently support reloading the configuration at run
time... but the config system does. When reloading is implemented it
will flag that after a config reload we don't look at the new value of
the variable that points to that config value, so there is no change the
daemon's behaviour even if that config setting is changed.  There are
just some things that you can't (or don't want to ;-) change at
run-time.

For consistency I'd almost like to see that condition coded as:

  if (conf_maybe_updating(mode)) {

although perhaps we should just write it as:

  if (mode != CONF_MODE_LOAD) {

since that catches the other cases consistently.

Let's see what Amitay says.  :-)

> I'm currently debugging a ctdb issue where I wondered whether tdb 
> mutexes are actually enabled (I was seeing "tdb_chain*un*lock() took 
> +-10 ms" many times in the logs).
> 
> "tdb mutexes" defaults to true, so I wonder whether this bug can cause 
> the default to not become effective, still wrapping my hear around the 
> ctdb config code...

Since it is just a warning, that can't happen.  2 simple
checks to do.

* Check the logs for the attach:

  2021/01/30 17:40:32.096676 ctdbd[1440545]: Attached to database
  'yip/node.0/db/volatile/foo.tdb.0' with flags 0x1c41

  That's logged at NOTICE level.

  #define TDB_MUTEX_LOCKING 4096 /** ... */  so that's the leading 1.

  I see this consistently for volatile databases.

* Check the logs for the warning "Ignoring update of ..."

  This is just to see if the warning is triggered.

  I don't see it when I try it out, so I started looking at the conf
  code to try to figure out why we don't see this logged when the
  config is initially loaded.

  However, the config is initially loaded before logging is setup
  (standard chicken/egg situation), so warnings/errors from initial
  configuration load probably go to stderr?  Perhaps systemd has caught
  them and put them in the journal?  We should do something about
  that... not sure what.

How's that?

peace & happiness,
martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20210226/720cfc1b/attachment.sig>