Latest TDB2 design and code...

Sun Sep 12 16:15:12 MDT 2010

On Sun, Sep 12, 2010 at 9:24 PM,  <tridge at samba.org> wrote:
> Hi Ronnie,
>
>  > Snapshots have many more nice properties than just ctdb and ctdb recoveries.
>  >
>  > They would allow things like
>  > * rewind to content from previous snapshot
>  > * (if cheap) compute delta between snapshot x and snapshot y
>  > * compute delta between snapshot n and snapshot n-1 to allow backup or
>  > replication of deltas.
>  > * a series of deltas between n and n-1 allow for very compact
>  > representation of a series of point in time backups.
>  > * they provide an internally consistent point in time representations
>  > if the data, which could be used for backup
>  > or traversal purposes. traversing and/or backing the data up online
>  > without locking database.
>
> yes, these are all basic properties of snapshots, but the question
> still stands - what will Samba use them for?
>
> How will users benefit from us having snapshots?
>
>  > Cheap snapshots have almost infinite number of use cases.
>  > I think snapshots are useful.
>
> I love them in filesystems, but I also know just how much complexity
> they add and just how much they can affect performance and prevent
> optimisations.
>
> I can see how they could be used in ctdb when you have wide area
> clusters. That is a pretty esoteric use case for something that will
> have a major impact on the code.
>
> So what is another use case that would make it worthwhile to add this
> to tdb?

* Recover from corruption
When a database becomes corrupted. A snapshot could provide an
automatic mechanism to restore it back to a last known good state
instead of hoping the user knows he/she really really need to make
backups of the important databases, such as idmap.
Users that do not know they must back these databases up are in a
world of pain when they discover they should have.
Losing just the last xxx hours/days of entries in idmap.tdb is much
preferable to losing the entire database.
Think production sites where you can not afford outages and have tens
of thousands of windows clients.

* Traversals.
Traversals are very expensive on large databases since they lock the
entire database.
This currently mean during the traversal, you can not do anything
complex or time consuming since if you block during the traversal
you just make the pain even worse.
If you had a snapshot, you could traverse the snapshot instead and do
any kind of complex computations on the elements or do any sort
of blocking calls you are in a world of pain.
I think for a traversal you want semantics where the traversal will
present a consistent point in time view of the database,
which without snapshots means that a traversal really has to stop any
changes from occuring to the database while the traversal is
in progress. This will be very painful for multi gigabyte databases.
It is already painful enough for multi mb databases.

* DB Consistency checker. (theoretical future feature)
Many databases, such as idmap.tdb contains internal relations between records.
Today, you can not create a tool to walk the entire database and
verify that all internal relations are consistent. That wuld require
you to get a lock on the entire database so that it does not change
while performing consistency checking. This will be very painful
without snapshots.

Think of this last one like an online fsck. Yes it is hard, but in
many cases an offline fsck is just not practical.

regards
ronnie sahlberg