Latest TDB2 design and code...

Sun Sep 12 16:24:17 MDT 2010

On Mon, Sep 13, 2010 at 8:15 AM, ronnie sahlberg
<ronniesahlberg at gmail.com> wrote:
> On Sun, Sep 12, 2010 at 9:24 PM,  <tridge at samba.org> wrote:
>> Hi Ronnie,
>>
>>  > Snapshots have many more nice properties than just ctdb and ctdb recoveries.
>>  >
>>  > They would allow things like
>>  > * rewind to content from previous snapshot
>>  > * (if cheap) compute delta between snapshot x and snapshot y
>>  > * compute delta between snapshot n and snapshot n-1 to allow backup or
>>  > replication of deltas.
>>  > * a series of deltas between n and n-1 allow for very compact
>>  > representation of a series of point in time backups.
>>  > * they provide an internally consistent point in time representations
>>  > if the data, which could be used for backup
>>  > or traversal purposes. traversing and/or backing the data up online
>>  > without locking database.
>>
>> yes, these are all basic properties of snapshots, but the question
>> still stands - what will Samba use them for?
>>
>> How will users benefit from us having snapshots?
>>
>>  > Cheap snapshots have almost infinite number of use cases.
>>  > I think snapshots are useful.
>>
>> I love them in filesystems, but I also know just how much complexity
>> they add and just how much they can affect performance and prevent
>> optimisations.
>>
>> I can see how they could be used in ctdb when you have wide area
>> clusters. That is a pretty esoteric use case for something that will
>> have a major impact on the code.
>>
>> So what is another use case that would make it worthwhile to add this
>> to tdb?
>
> * Recover from corruption
> When a database becomes corrupted. A snapshot could provide an
> automatic mechanism to restore it back to a last known good state
> instead of hoping the user knows he/she really really need to make
> backups of the important databases, such as idmap.
> Users that do not know they must back these databases up are in a
> world of pain when they discover they should have.
> Losing just the last xxx hours/days of entries in idmap.tdb is much
> preferable to losing the entire database.
> Think production sites where you can not afford outages and have tens
> of thousands of windows clients.
>
> * Traversals.
> Traversals are very expensive on large databases since they lock the
> entire database.
> This currently mean during the traversal, you can not do anything
> complex or time consuming since if you block during the traversal
> you just make the pain even worse.
> If you had a snapshot, you could traverse the snapshot instead and do
> any kind of complex computations on the elements or do any sort
> of blocking calls you are in a world of pain.
> I think for a traversal you want semantics where the traversal will
> present a consistent point in time view of the database,
> which without snapshots means that a traversal really has to stop any
> changes from occuring to the database while the traversal is
> in progress. This will be very painful for multi gigabyte databases.
> It is already painful enough for multi mb databases.
>
> * DB Consistency checker. (theoretical future feature)
> Many databases, such as idmap.tdb contains internal relations between records.
> Today, you can not create a tool to walk the entire database and
> verify that all internal relations are consistent. That wuld require
> you to get a lock on the entire database so that it does not change
> while performing consistency checking. This will be very painful
> without snapshots.
>
> Think of this last one like an online fsck. Yes it is hard, but in
> many cases an offline fsck is just not practical.
>

One more

* looping over records
There are several places in samba (and other tools) where you need to
loop over a largish number of records.
For example locking.tdb that is accessed for every single file in a
directory when you perform a basic directory scan of the parent
directory.
Each of these require two fcntl() calls per file/record at non-zero
cost. Especially for cases where locking.tdb is very busy and there
exists already
a large number of locks on this file.

If snapshots existed and were cheap to use, loops such as the one
above and similar, which needs to access a large number
of records in sequence could do so while holding a lock for the entire
database (snapshot) instead of locking each chain/record individually,
or maybe even without locking ata all.
This would provide measurable performance improvements for these kind of loops.

regards
ronnie sahlberg