CTDB - the 'Mainz' plan for clustered Samba

Sun Oct 1 20:39:16 GMT 2006

Simo,

 > What happen if an unresponsive server comes up before the recovery is
 > finished (same question if one node goes down during recovery)?

Recovery will start again. If nodes keep going up and down at a great
rate then recovery will last forever. If that happens, then stop
flicking the power switch on and off :)

 > We need to start a new recovery phase of course, but is it ok to do so
 > at any moment? Even during another recovery?

We could either wait until the end of the first recovery or we can
interrupt and start again. I haven't written a full spec for the
recovery protocol yet, mostly because I haven't worked out all the
details yet :)

 > > Also remember that the total amount of records in these databases
 > > tends to be small. It might be a few thousand on a moderately busy
 > > box, but it won't be billions. That means that redistributing data on
 > > a fast network is trivial.
 > 
 > Do we have any real statistic do back this?
 > What will happen in the worst case scenario?

well, with current SMB protocol we are limited to 64k open files per
node. Lets assume 100 nodes, so thats 6.4 million open files max. Each
record is maybe 100 bytes, so thats 640MByte. On a gigabit link thats
maybe 10 seconds of data (not taking into account the natural
parallelism and the fact that bisection bandwidth is much larger than
node to node bandwidth).

Even if we were very silly and sent each record as a separate message,
then that would be 100usec per record, which would take 10 minutes. Of
course, we won't do that, we'll batch them up so we don't pay a
latency per record, but even 10 minute recovery is good for that many
open files. I expect in practice it will be more like 10 to 20
seconds.

The brlock.tdb could be larger in theory, but in practice is very
small.

Cheers, Tridge