[Samba] [CTDB] how does LMASTER know where the record is stored?

Michael Adam obnox at samba.org
Wed Apr 13 07:07:43 MDT 2011


Hi David,

David Roid wrote:
> Greetings list,
> 
> I was looking at the wiki "samba and clustering" and a ctdb.pdf, admittedly
> both are quite old (2006 or 2007) and I don't know how things change over
> years, but I just have two questions about LMASTER:

First off, I have written a small paper on ctdb in 2009 which is still
mostly correct today:

http://samba.org/~obnox/presentations/sambaXP-2009/samba-and-ctdb.pdf

It is also linked on http://ctdb.samba.org/documentation.html .
But the details about LMASTER have been omitted. Maybe I should
write an update version. :-)

> < this is from pdf >
> LMASTER fixed
> ● LMASTER is based on record key only
> ● LMASTER knows where the record is stored
> ● new records are stored on LMASTER
> 
> Q1. From the output of "ctdb status" I can see that LMASTER is bacially
> configured as the node itself, then how does each node know where the record
> is stored? By broadcast to all nodes or any other way? And more importantly,
> when?
> 
> Q2. If new records are stored on LMASTER, do these records need to be synced
> within the cluster? And when?

Let me explain to some detail about CTDB's view of tdb records.

The trick in ctdb (that enables Samba to scale well in the cluster)
is that it does _not_ propagate record updates to all nodes in the
cluster. There are two essential roles for a node with respect to a
record in ctdb:

1. a record's data master (aka DMASTER):
   This is the node that holds the current and authoritative copy
   of the record. This is the node that has last announced its
   intention to change that record and was granted permission.
   The DMASTER role moves in the cluster as different nodes write
   to the record.
   Nodes that were DMASTER of the record previously may hold
   older copies of the record.
   The records contain a special header field "record sequence
   number" (aka RSN) which is incremented whenever the DMASTER
   role is moved from one node to another.

2. a record's location master (LMASTER):
   This is the node that knows the data master for the given
   record. The LMASTER for a record is a fixed record in the
   in the cluster (as long as the list of active nodes does not
   change). It is calculated from the record like this
   (in the simplest case):
   A 32 bit hash value is calculated from the record's key.
   This 32bit value is taken module the number of nodes to
   yield the LMASTER's node number (if the nodes are numbered
   without gap starting at 0).

   Hence it is always cheap to contact the LMASTER and the
   LMASTER knows how to find the DMASTER.

When a node wants to write to a record, it requests the DMASTER
role for that record. It does so by sending an appropriate
network request to the record's LMASTER. The lmaster knows
whether the record existed previously and if so it requests the
DMASTER got transfer the DMASTER role along with the record's
contents via the LMASTER to the requesting node. If the record
did not previously exist, the LMASTER creates and empty initial
record and transfers this to the requesting node.
This way, the LMASTER always has the previous copy of the record.


Regarding the output of "ctdb status", e.g.:

Number of nodes:4
pnn:0 10.0.0.21        OK (THIS NODE)
pnn:1 10.0.0.22        OK
pnn:2 10.0.0.23        OK
pnn:3 10.0.0.20        OK
Generation:1987363808
Size:4
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
hash:3 lmaster:3
Recovery mode:NORMAL (0)
Recovery master:1

Here you see a 4-node cluster with node numbers 0,1,2,3.
And you see which node number is lmaster for a given
key's hash value modulo 4(==Size). Generally, you will see
something like "hash:X lmaster:Y".
E.g. stop ctdb on node number 0 an look again at ctdb status.
You will see:

Size:3
hash:0 lmaster:1
hash:1 lmaster:2
hash:2 lmaster:3


> Excuse me if this comes off sort of rude, it's just there are not enough
> docs of CTDB on samba site.

No problem, there is also some (potentially) deprecated info on
the wiki.samba.org. But the lmaster bit might be worth explaining
in more detail anyways.

Do these explanations make things more clear for you?

Cheers - Michael

> Faithfully
> -David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 206 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba/attachments/20110413/bd86f2af/attachment.pgp>


More information about the samba mailing list