[CTDB] progress

Sat Dec 16 22:34:02 GMT 2006

Alexander,

 > It means that we would have multiple fetches from tdb to process single
 > request if those operations would be decoupled.

no, it just means the API between ctdb_call.c and ctdb_ltdb.c needs to
cope with both the header and the body in one function. 

So we'd have a function ctdb_ltdb_fetch() that fetches both the header
and the body (internally using one tdb_fetch() call), and then looks
at the header to determine what to do. If from that it decides that
the call does need to be handled locally then the body data is already
available, and no more tdb_fetch calls are needed.

What I'm saying is that the ctdb_ltdb_fetch() call needs to return two
separate TDB_DATA results - one with the header information and the
other with the body.

 > I mean that we have to do ltdb_fetch(ctdb, key) every time we make
 > decisions on LMASTER/DMASTER status instead of basing some of those
 > decisions on header of an incoming packet.

You can't base the decisions on the header of a packet, and the packet
doesn't contain the authoritative information on who the dmaster
is. The only place you can get that information is from the ltdb.

 > tdb_fetch/tdb_store multiples per one ctdb operation.

no, it only takes one fetch and one store.

Please don't confuse the logic with code layout.

 > We have an incoming packet's header with destination/source of nodes,
 > reqid and other fields, in addition to the payload. Why can't we use
 > those?

because they don't contain the information we need! The fact that the
sending node sent the request to us means the sending node was _not_
the dmaster. That means the sending node does not know for sure who
the dmaster is. It thinks it might be us, but it might be wrong. It
might even be that we were the dmaster when the packet was sent but we
aren't any more.

We _must_ look in the ltdb to find out if we are the dmaster. In
nearly every case we will find that the sender was correct, and we are
in fact the dmaster. In that case we will not have wasted a tdb_fetch
as we now need to process the call, and thus we need the body of the
record, and hey presto, we've just fetched it.

The result is something highly efficient (no wasted calls), but also
_reliable_. If we start basing decisions on another nodes idea of who
the dmaster is then we have a race condition. That is the inevitable
result of the lock-avoiding strategy that ctdb uses.

 > My understanding was that if we are DMASTER for the record then we are
 > also LMASTER for it. Not sure how I made to this though ;-) but you're
 > right that this code isn't complete.

you thought that lmaster == dmaster? What is the point of having the
two concepts if they are the same??

The DMASTER will also be the LMASTER when the record is first
initialised and immmediately after a recovery, but after that they
will be quite separate. The LMASTER keeps track of who the DMASTER
is. The DMASTER owns the data. The two ideas are quite separate.

Cheers, Tridge