[RFC] tdb_traverse_read_lite()

Thu Mar 7 00:40:52 MST 2013

On Thu, Mar 07, 2013 at 04:50:04PM +1100, Rusty Russell wrote:
> Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
> > On Wed, Mar 06, 2013 at 05:58:41PM +1100, Rusty Russell wrote:
> >> Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
> >> > On Tue, Mar 05, 2013 at 01:33:18PM +0100, Stefan (metze) Metzmacher wrote:
> >> >> I'd also prefer a chain traverse function, that could be used
> >> >> in a lot of places to reduce the cleanup costs.
> >> 
> >> I'm not so sure... what would we want to wait for?  Our dbs these days
> >> have huge hashsizes, so this kind of traversal doesn't block much
> >> activity.  But benchmarks will show...
> >
> > For vacuuming it might be nice that under high stress only
> > spend a few percent of time on it. Under certain situations
> > we can afford to just leave it for a while to wait for the
> > storm to end. It might be a lot easier to figure out that
> > the "time slice" for vaccuuming has ended while we are doing
> > it instead of doing a precalculation. Sure, it is possible
> > to prematurely stop the traverse_read_lite, but it is not
> > possible to pick up where we left it. If you are concerned
> > about exposing the hash chain number (we do already expose
> > tdb_hash_size()), would it be okay to add an API to
> > "traverse beginning with the hash chain for this key"?
> 
> Another option, which Amitay suggested, was to do non-blocking
> chainlocks and skip over chains which we can't lock.  This risks never
> vacuuming high-traffic chains, so we'd need to do blocking locks
> sometimes.
> 
> So how about combining the approaches, like this.  Instead of
> tdb_traverse_read_lite():
> 
> /* Returns -1 on error, or chain number it reached. */
> int tdb_traverse_read_nonblock(struct tdb_context *tdb, int start, int end,
>                                tdb_traverse_func fn, void *private_data)
> 
> int tdb_chainlock_read_bynum(struct tdb_context *tdb, int chain);
> int tdb_chainunlock_read_bynum(struct tdb_context *tdb, int chain);
> 
> This leaves the heuristic up to ctdb: it can just skip over chains
> returned by tdb_traverse_read_nonblock(), or it can use
> tdb_chainunlock_read_bynum() to block on them (maybe if the same chain
> gets skipped multiple times?).

Yep, that would certainly also work fine.

Volker

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de

**********************************************************
visit us at CeBIT: March 5th - 9th 2013, hall 6, booth E15
all about SAMBA and verinice, firewalls, Linux and Windows
free tickets available via email here : cebit at sernet.com !
**********************************************************