[Samba] CTDB startup issue

Thu Nov 5 15:02:08 UTC 2020

Great helpful info as always, Martin, thanks!

On Thu, Nov 5, 2020 at 5:50 AM Martin Schwenke <martin at meltin.net> wrote:

> Hi Bob,
>
> On Fri, 23 Oct 2020 16:40:10 -0400, Robert Buck via samba
> <samba at lists.samba.org> wrote:
>
> > We've only seen this error once, just occurred. We were having domain
> join
> > issues.
> >
> > Enter ******@*******.local's password:
> > ../../source3/lib/dbwrap/dbwrap_ctdb.c:855 ERROR: new_seqnum[10] !=
> > old_seqnum[8] + (0 or 1) after failed TRANS3_COMMIT - this should not
> > happen!
> > secrets_store_JoinCtx: dbwrap_transaction_commit() failed for SOMDEV
> > libnet_join_joindomain_store_secrets: secrets_store_JoinCtx() failed
> > NT_STATUS_INTERNAL_DB_ERROR
> > Failed to join domain: This machine is not currently joined to a domain.
> >
> > What does this mean? Is there a simple remedy?
>
> A transaction to attempt to commit changes to a persistent (and
> replicated) database (in this case secrets.tdb) failed.  The most
> likely reason is because there was a database recovery.  The recovery
> should either:
>
> 1. throw away the changes (because the attempt to commit occurred too
>    late) resulting in no change to the sequence number; or
>
> 2. it should implicitly commit the changes (because they were
>    already stored on 1 or more nodes) resulting in the sequence number
>    being incremented
>
> This could happen due to a bug... but the code looks good.
>
> The other possibility is what I'll call an "implicit split brain"
> because I'm too tired to check whether there's a better term for this.
> This can happen most easily with a 2 node cluster, but you'll be able
> to extrapolate to see how it can occur with more nodes.
>
> Nodes A & B are running.
>
> 1. secrets.tdb has been updated several times so that its sequence
>    number is N
>
> 2. Node A is shut down
>
> 3. secrets.tdb is updated 5 times, so the sequence number on node B is
>    N+5
>
> 4. Node B is shut down
>
>    The persistent databases, which use sequence numbers, are now
>    partitioned. There are 2 sets of databases that aren't connected via
>    recovery.
>
> 5. Node A is brought up
>
> 6. secrets.tdb is updated 3 times, do the sequence number on node A is
>    N+3
>
> 7. While attempting the next secrets.tdb update, node B comes up
>
> 8. Recovery, during the attempted commit, uses the database from node B,
>    with sequence number N+5... so the sequence number increases by 2
>
> Checking the logs would obviously tell you which nodes were up/down
> when and whether this is a reasonable explanation.
>
> I would have to do some reading to properly remember options for how to
> design replicated databases to avoid this. I think that one option
> mentioned in "Designing Data Intensive Applications" is to use
> timestamps instead of sequence numbers... but I guess then you might
> have issues with timestamp granularity.  I think quorum can also help
> here (need, say, 2 of 3 nodes active to make progress).  Raft might also
> avoid this for similar readings.  I need to more time to re-read
> things I've read before... but am happy to take advice... :-)
>
> The summary is that the persistent databases in CTDB use sequence
> numbers so you need to ensure that one node in the cluster is up at
> all times.  If you flip-flop between nodes, with intervening downtime,
> then you can get unexpected results.
>
> peace & happiness,
> martin
>
>

-- 

BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER

SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T  (212) 298-9624
ROBERT.BUCK at SOM.COM