[RFC] tdb_traverse_read_lite()

Rusty Russell rusty at ozlabs.org
Wed Mar 6 22:50:04 MST 2013


Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
> On Wed, Mar 06, 2013 at 05:58:41PM +1100, Rusty Russell wrote:
>> Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
>> > On Tue, Mar 05, 2013 at 01:33:18PM +0100, Stefan (metze) Metzmacher wrote:
>> >> I'd also prefer a chain traverse function, that could be used
>> >> in a lot of places to reduce the cleanup costs.
>> 
>> I'm not so sure... what would we want to wait for?  Our dbs these days
>> have huge hashsizes, so this kind of traversal doesn't block much
>> activity.  But benchmarks will show...
>
> For vacuuming it might be nice that under high stress only
> spend a few percent of time on it. Under certain situations
> we can afford to just leave it for a while to wait for the
> storm to end. It might be a lot easier to figure out that
> the "time slice" for vaccuuming has ended while we are doing
> it instead of doing a precalculation. Sure, it is possible
> to prematurely stop the traverse_read_lite, but it is not
> possible to pick up where we left it. If you are concerned
> about exposing the hash chain number (we do already expose
> tdb_hash_size()), would it be okay to add an API to
> "traverse beginning with the hash chain for this key"?

Another option, which Amitay suggested, was to do non-blocking
chainlocks and skip over chains which we can't lock.  This risks never
vacuuming high-traffic chains, so we'd need to do blocking locks
sometimes.

So how about combining the approaches, like this.  Instead of
tdb_traverse_read_lite():

/* Returns -1 on error, or chain number it reached. */
int tdb_traverse_read_nonblock(struct tdb_context *tdb, int start, int end,
                               tdb_traverse_func fn, void *private_data)

int tdb_chainlock_read_bynum(struct tdb_context *tdb, int chain);
int tdb_chainunlock_read_bynum(struct tdb_context *tdb, int chain);

This leaves the heuristic up to ctdb: it can just skip over chains
returned by tdb_traverse_read_nonblock(), or it can use
tdb_chainunlock_read_bynum() to block on them (maybe if the same chain
gets skipped multiple times?).

Cheers,
Rusty.


More information about the samba-technical mailing list