Using TDB for glib/gobject applications

Tue Jun 23 16:42:22 MDT 2009

Hi,

On Wed, 2009-06-24 at 08:06 +1000, tridge at samba.org wrote:
> Hi Simo,
> 
>  > Concurrent transactions are basically impossible with TDB, how would you
>  > handle 2 separate transaction create an object with the same key ?
> 
> I wouldn't go as far as saying that they are impossible, but they are
> quite tricky. What you need is essentially what Ronnie and I added for
> the cluster-wide transactions in ctdb (see ctdb_replay_transaction()).
> 
> I've CCd Rusty as he is currently working on a possible new format and
> transaction system for tdb, so he may have some comments.
> 
> The way it works is this:
> 
>   - during a transaction, record in some data structure (eg. linked
>     list) all of the operations of the transaction. The key thing to
>     record is all the keys of the records that are read and the state
>     of the record. If the records had a sequence number that would be
>     ideal, otherwise a moderately strong hash of the record would do.
> 
>   - during the main transaction phase the locking would prevent
>     non-transaction writes from happening, but would not prevent other
>     transactions from starting.
> 
>   - during the commit phase an exclusive lock would be taken (so only
>     one commit happens at a time), and the commit logic would re-read
>     all the records that were read during the transaction. If any of
>     them have changed then we would have to fail the transaction. (for
>     ctdb we had to do something a bit more complex, where we had the
>     possibility of replaying the whole transaction. That was needed to
>     cope with network disconnects, which we don't have to worry about
>     for local tdb).

This is pretty similar to what I emulated in my layer.  If course, I
don't yet have an ability to replay, but since the actual changes don't
happen until the commit gets processed, I think its less likely to fail
to begin with. (Probably not optimal in speed though since more work is
done in your critical section).

> The big change would be that code that uses transaction would need to
> cope with the possibility of a transaction commit failing due to a
> conflict with another transaction. The usual way to cope with that is
> a retry loop. That was why I didn't put concurrent transactions in tdb
> - I wanted to keep the use of transactions simple from the programmers
> point of view.
> 
> One thing we could consider is having a tdb flag
> TDB_CONCURRENT_TRANSACTIONS to say that we are happy to cope with
> concurrent transactions. So existing code would get reliable
> transactions (reliable in the sense that you don't need to retry
> them), whereas places that need concurrent transactions could ask for
> them.
> 
> The main place I could see us using concurrent transactions in Samba
> would be ldb, where it might solve a problem we have at the moment,
> where the single process Samba4 process model can have multiple ldap
> 'transactions' at the ldb level mapped to a single tdb transaction. To
> solve I think that we'd also have to introduce the concept of a
> "transaction handle" to identify which transaction we're talking
> about, or find some way to have the same tdb open twice in the same
> process (perhaps similar to what Howard did for the split of tdb into
> per-thread and global parts of the tdb structure).

When starting a "concurrent transaction", I return a gulong as the
transaction id and use it as the key for the hashtable.  I assume an
opaque would yield the same result.  A list for the order the
transactions is used to make sure I play back in order of commit
request.

> Christian, do you want concurrent transactions within a process or
> between processes?

Holy cow, between processes would be cool.  However, it's not a
requirement for me yet.

I'm getting ready to start attacking indexes as well.  Has anyone
thought about that, or does something exist already?

For example, if I'm serializing a GObject (essentially key/value pairs
in storage) I'd like to index on a given property.  Right now, the only
option that comes to mind is a secondary TDB instance that contains the
index value as key and potentially multiple values for the data (which
are really pointers back to keys in the primary TDB instance).  This,
unfortunately, complicates the transaction system further since it would
require transactions to span multiple physical files.

Thanks!

-- Christian