ctdb shortcut locking

Mon Apr 16 07:20:43 GMT 2007

Volker,

Ronnie and I have now fleshed out a more complete picture of how we
will do scalable ctdb_fetch_lock(). I'll try and explain it below, and
we'd appreciate your comments.

The nomenclature will be:

  - 'client' is a ctdb client, which will be an instance of a smbd
    daemon for Samba

  - 'ctdbd' is the ctdb daemon

pseudo-code for ctdb_fetch_lock in the client:

 ctdb_fetch_lock() {
	/* do a blocking chainlock on the record */
	tdb_chainlock(); 

	/* fetch the header and data for the record */
	ctdb_ltdb_fetch();

	/* if we are not the dmaster, then ask ctdbd to make us the
	   dmaster */
	if (header.dmaster != ctdb->vnn) {
		ask_ctdb_daemon_to_make_us_master();
		wait_for_reply_from_daemon();

		/* the dmaster reply has come in, but the daemon has
		   _not_ updated the record header to mark us as the
		   dmaster - do that now in the client */
		locally_update_record_header();
        }

	/* we are now the dmaster, and we have the record locked */
	return record;
 }

here is the pseudo-code in the daemon:

 ctdbd_make_us_dmaster() {
	/* use a CTDB_CALL to force migration of the record */
	call->call_id = CTDB_FETCH_FUNC;
	call->flags = CTDB_IMMEDIATE_MIGRATION;
        ctdb_call_send();

	/* when we get the reply, then don't update the header
	in the daemon, instead send the reply up to the client and get it to
	do the work. This makes it race free, as the client held the
	chainlock the whole time */
	onreply:
		send_record_to_client();
 }

notice that the above code in the daemon involves no IO on the
database, and no locking. The client is doing all the db IO and
locking.

Additionally, in the ctdb daemon, when it receives a ctdb request from
a remote node that requires it to read/write the ltdb, it uses the
event driven tdb code I showed in the last message to do the IOs. 

 ctdbd_process_remote_request() {
	/* try to get the lock in a non-blocking fashion */
	tdb_chainlock_nonblock();

	/* if we can't get the lock immediately, then use the event
	driven tdb locking to defer the request until we get the lock
	*/
	if (!got_lock) {
		/* pass in the request as the private data in the callback,
		   so we can retry the request when the event driven
		   locking triggers */
		tdb_chainlockwait(....., ctdbd_process_remote_request, request);
		return;
	}

	/* we have the lock, process the request normally */
	....
 }

The risk with all this is that the client will die before it updates
the header on the record, thus leaving the database with an inconsistent
dmaster. We can handle that in the destructor for the client, by
checking that the record header wasupdated for the last record it
operated on. We only have to check at most 1 record per client
(assuming clients are single threaded).

Make sense? We should be able to make this race free, and it should
run as fast as normal Samba on local tdb for the uncontended case. 

Cheers, Tridge