[Samba] CTDB startup issue

Thu Nov 5 10:50:29 UTC 2020

Hi Bob,

On Fri, 23 Oct 2020 16:40:10 -0400, Robert Buck via samba
<samba at lists.samba.org> wrote:

> We've only seen this error once, just occurred. We were having domain join
> issues.
> 
> Enter ******@*******.local's password:
> ../../source3/lib/dbwrap/dbwrap_ctdb.c:855 ERROR: new_seqnum[10] !=
> old_seqnum[8] + (0 or 1) after failed TRANS3_COMMIT - this should not
> happen!
> secrets_store_JoinCtx: dbwrap_transaction_commit() failed for SOMDEV
> libnet_join_joindomain_store_secrets: secrets_store_JoinCtx() failed
> NT_STATUS_INTERNAL_DB_ERROR
> Failed to join domain: This machine is not currently joined to a domain.
> 
> What does this mean? Is there a simple remedy?

A transaction to attempt to commit changes to a persistent (and
replicated) database (in this case secrets.tdb) failed.  The most
likely reason is because there was a database recovery.  The recovery
should either:

1. throw away the changes (because the attempt to commit occurred too
   late) resulting in no change to the sequence number; or

2. it should implicitly commit the changes (because they were
   already stored on 1 or more nodes) resulting in the sequence number
   being incremented

This could happen due to a bug... but the code looks good.

The other possibility is what I'll call an "implicit split brain"
because I'm too tired to check whether there's a better term for this.
This can happen most easily with a 2 node cluster, but you'll be able
to extrapolate to see how it can occur with more nodes. 

Nodes A & B are running.

1. secrets.tdb has been updated several times so that its sequence
   number is N

2. Node A is shut down

3. secrets.tdb is updated 5 times, so the sequence number on node B is
   N+5

4. Node B is shut down

   The persistent databases, which use sequence numbers, are now
   partitioned. There are 2 sets of databases that aren't connected via
   recovery.

5. Node A is brought up

6. secrets.tdb is updated 3 times, do the sequence number on node A is
   N+3

7. While attempting the next secrets.tdb update, node B comes up

8. Recovery, during the attempted commit, uses the database from node B,
   with sequence number N+5... so the sequence number increases by 2

Checking the logs would obviously tell you which nodes were up/down
when and whether this is a reasonable explanation.

I would have to do some reading to properly remember options for how to
design replicated databases to avoid this. I think that one option
mentioned in "Designing Data Intensive Applications" is to use
timestamps instead of sequence numbers... but I guess then you might
have issues with timestamp granularity.  I think quorum can also help
here (need, say, 2 of 3 nodes active to make progress).  Raft might also
avoid this for similar readings.  I need to more time to re-read
things I've read before... but am happy to take advice... :-)

The summary is that the persistent databases in CTDB use sequence
numbers so you need to ensure that one node in the cluster is up at
all times.  If you flip-flop between nodes, with intervening downtime,
then you can get unexpected results.

peace & happiness,
martin