[RFC] tdb_traverse_read_lite()

Rusty Russell rusty at ozlabs.org
Thu Mar 7 16:57:52 MST 2013


Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
> On Thu, Mar 07, 2013 at 04:50:04PM +1100, Rusty Russell wrote:
>> Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
>> > On Wed, Mar 06, 2013 at 05:58:41PM +1100, Rusty Russell wrote:
>> >> Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
>> >> > On Tue, Mar 05, 2013 at 01:33:18PM +0100, Stefan (metze) Metzmacher wrote:
>> >> >> I'd also prefer a chain traverse function, that could be used
>> >> >> in a lot of places to reduce the cleanup costs.
>> >> 
>> >> I'm not so sure... what would we want to wait for?  Our dbs these days
>> >> have huge hashsizes, so this kind of traversal doesn't block much
>> >> activity.  But benchmarks will show...
>> >
>> > For vacuuming it might be nice that under high stress only
>> > spend a few percent of time on it. Under certain situations
>> > we can afford to just leave it for a while to wait for the
>> > storm to end. It might be a lot easier to figure out that
>> > the "time slice" for vaccuuming has ended while we are doing
>> > it instead of doing a precalculation. Sure, it is possible
>> > to prematurely stop the traverse_read_lite, but it is not
>> > possible to pick up where we left it. If you are concerned
>> > about exposing the hash chain number (we do already expose
>> > tdb_hash_size()), would it be okay to add an API to
>> > "traverse beginning with the hash chain for this key"?
>> 
>> Another option, which Amitay suggested, was to do non-blocking
>> chainlocks and skip over chains which we can't lock.  This risks never
>> vacuuming high-traffic chains, so we'd need to do blocking locks
>> sometimes.
>> 
>> So how about combining the approaches, like this.  Instead of
>> tdb_traverse_read_lite():
>> 
>> /* Returns -1 on error, or chain number it reached. */
>> int tdb_traverse_read_nonblock(struct tdb_context *tdb, int start, int end,
>>                                tdb_traverse_func fn, void *private_data)
>> 
>> int tdb_chainlock_read_bynum(struct tdb_context *tdb, int chain);
>> int tdb_chainunlock_read_bynum(struct tdb_context *tdb, int chain);
>> 
>> This leaves the heuristic up to ctdb: it can just skip over chains
>> returned by tdb_traverse_read_nonblock(), or it can use
>> tdb_chainunlock_read_bynum() to block on them (maybe if the same chain
>> gets skipped multiple times?).
>
> Yep, that would certainly also work fine.

OK, I'll see if Amitay has any luck with the existing patch first, then
I'll spin something which does this.

Thanks!
Rusty.


More information about the samba-technical mailing list