CTDB API

Alexander Bokovoy ab at samba.org
Tue Oct 31 14:48:10 GMT 2006


Tridge,

tridge at samba.org wrote:
> Alexander,
> 
>> When speaking of "conditional append" where do the condition
>> function has to be executed?
> 
> They are executed on the DMASTER. In many cases the DMASTER will be 
> the local node (as the record will have migrated), but it could just 
> as easily be any other node.
> 
> A 'conditional append' is very much like a RPC call. Every node in
> the cluster registers the same set of functions which are available
> via one of these calls. The CTDB infrastructure sends the parameters
> of this function to the node that has the record data. It runs the 
> function on behalf of the caller, possibly updating the record.
> 
> I will probably rename 'conditional append' to something like 
> CTDB_REQ_CALL to make this clearer. I also think that CTDB_REQ_FETCH 
> can just be an instance of a CTDB_REQ_CALL (so we can remove it).
> 
> It might make it easier to follow if I reword the CTDB proposal to
> use more RPC style language. It is really just a 'record oriented RPC
>  system' and maybe thinking of it that way will be easier.
Understood.

Below are some notes on CTDB API as we are continuing discussing and
dissecting it with Aleksey.

int ctdb_set_conditional(struct ctdb_context *ctdb, ctdb_conditional_fn
fn, uint32_t condition_id, void *private);

  - this function has to be executed on all dispatchers, with the same
set of conditional functions and IDs as all conditions have to be the
same on all DMASTERs and all nodes are potential DMASTERs.

struct ctdb_record  *ctdb_fetch_locked(struct ctdb_context *ctdb,
TALLOC_CTX *mem_ctx, TDB_DATA key);
int ctdb_store_unlock(struct tdb_context *ctdb, struct ctdb_record *rec);

  - these two functions work with record as a whole; CTDB does not
interpret its content (subrecords, etc) therefore, it is not
application-specific part of CTDB.
  - these functions are helpful for fast implementation of locking et al
functionality w/o much modification of higher level code (locking.c, ...)

TDB_DATA ctdb_fetch(struct ctdb_context *ctdb, TALLOC_CTX *mem_ctx,
TDB_DATA key);

  - this function fetches record as a whole as well, no interpretation
on CTDB side as well because it is application (client) which will
dissect and process the data.
  - In current Samba code fetch function is used mostly for preparing
conditional calls but in most cases for printing content of record;

The function itself is quite simple re:CTDB layer:
1) extract data from a storage by key
2) return to the caller (client)

int ctdb_delete(struct ctdb_context *tdb, TDB_DATA key);

  - Delete function isn't needed due to the fact that existing
implementation assumes that deletion is done as 'delete record as whole'
which means we never have null records in the database. Consequently, we
can make whole record delete implicit -- once record's components are
freed, the record itself can be removed from database. So this
functionality can be implicitly implemented in remove_and_trigger or in
store_unlock() with null data.

int ctdb_conditional_append(struct ctdb_context *ctdb, uint32_t
condition_id, TDB_DATA key, TDB_DATA data);

  - We think that 'data' here could be separated in two parts internally:
	- subkey for sub-indexing within record indexed by 'key'
	- and 'data' itself.
  The idea is that we store a record in TDB, this record corresponds to
a key 'key' but the record isn't atomic, in fact it is constructed from
a set of 'sub-elements'. Each of them has its own subkey and data. For
example, in locking.tdb these are process_id and actual flags, share
modes, etc for a file opened by this process_id.

  - A record corresponding to the key could contain more than just a set
of elements (subkey+data) but also additional information. For example,
in locking.tdb these are filename and other information which is shared
between all process_id. It is not clear what to do with this information
(how to store it), may be this data could be added by the very first
conditional_append() but then what subkey should it have and how could
we separate it from (or isolate it in) CTDB layer so that the
application layer is unchanged (or otherwise, modified -- in latter case
we need to change smb.h structures, for example).

  - the condition is executed on DMASTER

Approximate code flow:
1) retrieve data by key
2) parse data (transform internal representation which might include
some CTDB-specific data per-subkey) (*)
3) execute condition function over parsed data (*)
3.5) condition function might attach a trigger to a specific subkey as
side effect. This trigger is then executed upon subkey's removal. The
question is how to store trigger's information -- in subkey data or in a
separate structure (common to whole key's data).
4) if condition execution returned TRUE, add subkey and data to the record
5) return condition's result to the caller (true, false)

int ctdb_remove_and_trigger(struct ctdb_context *ctdb, TDB_DATA key,
TDB_DATA data);

  - removes previously added subrecord from the key
  - we think that it may sense to rename 'TDB_DATA data' to 'TDB_DATA
subkey' as it is in fact not a data for the key.

Possible code flow:
1) retrieve record by key
2) parse record (*)
3) extract subkey and call trigger if it is attached to the subrecord
4) store modified record
5) return result to the caller (true, false)

Important note (*):

It appears that CTDB layer has to call some application-specific code.
This code are conditional functions and triggers because only
application knows precise structure of data blobs (size and structure of
sub-keys, what needs to do with them, etc), parsers (CTDB doesn't know
how sub-records are organized). The question is how to organize this
layer split.

We thought that probably it makes sense to split dispatcher code as CTDB
layer (tdb, protocol, events, main()) and application-specific part
(Samba-specific code in this case) so that actual dispatcher daemon
would be produced by combining otherwise common CTDB 'library' and
application-specific hooks (conditions, triggers, parsers).

-- 
/ Alexander Bokovoy
Samba Team                      http://www.samba.org/
ALT Linux Team                  http://www.altlinux.org/
Midgard Project Ry              http://www.midgard-project.org/


More information about the samba-technical mailing list