[RFC] tdb_traverse_read_lite()
Volker Lendecke
Volker.Lendecke at SerNet.DE
Tue Mar 5 05:52:02 MST 2013
On Tue, Mar 05, 2013 at 01:33:18PM +0100, Stefan (metze) Metzmacher wrote:
> Am 05.03.2013 13:04, schrieb Volker Lendecke:
> > On Mon, Mar 04, 2013 at 05:57:15PM +1030, Rusty Russell wrote:
> >> Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
> >>> On Thu, Feb 28, 2013 at 02:11:18PM +1030, Rusty Russell wrote:
> >>>> Amitay asked me to look at a lightweight traverse for tdb. This is my
> >>>> first attempt (against his ctdb tree, but it's pretty simple).
> >>>
> >>> What about doing a traverse by chain? Sure, not atomic
> >>> whatsoever, but potentially even less intrusive.
> >>>
> >>> Volker
> >>
> >> I'm not sure what you mean... this traverses one chain at a time, but it
> >> holds the lock for that chain across the traverse function.
> >
> > Attached find a version of your code that actually applies
> > against master and builds there.
> >
> > In particular for the vacuuming code in ctdb I think it
> > might be valuable to have a traverse function that is even
> > lighter. The traverse_read_chain might even be optionally
> > non-blocking. This way ctdb could much better control to
> > spend x % of its time in vacuuming or could be more generous
> > to other daemons on the system in a high-load situation. It
> > could walk all chains that are not locked and only if a
> > chain was not taken care of for n seconds it could try the
> > blocking variant.
>
> I'd also prefer a chain traverse function, that could be used
> in a lot of places to reduce the cleanup costs.
While looking closer: The traverse_read_lite code seems to
not do the fancy locking that the next_hash_chain callback
assumes. See the big comment regarding empty hash chains in
the tdb_next_lock routine. Don't we also have to lock at
least once, like traverse_read_lite does? Also,
traverse_lite does not do the "Detect tight infinite loops"
thing, right?
Next one: Attached find an untested patch that makes
traverse_lite avoid allocating the whole thing. Instead, it
uses tdb_parse_data. malloc/free are not entirely free, they
can easily show up in profiles.
With best regards,
Volker Lendecke
--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de
**********************************************************
visit us at CeBIT: March 5th - 9th 2013, hall 6, booth E15
all about SAMBA and verinice, firewalls, Linux and Windows
free tickets available via email here : cebit at sernet.com !
**********************************************************
-------------- next part --------------
From 883764a84d1237d7375b55ad17ed5c1b3d6a40b3 Mon Sep 17 00:00:00 2001
From: Volker Lendecke <vl at samba.org>
Date: Tue, 5 Mar 2013 13:49:01 +0100
Subject: [PATCH] tdb: Avoid malloc/free in traverse_read_lite
---
lib/tdb/common/traverse.c | 51 ++++++++++++++++++++++++++++++++------------
1 files changed, 37 insertions(+), 14 deletions(-)
diff --git a/lib/tdb/common/traverse.c b/lib/tdb/common/traverse.c
index 2bac689..e75993a 100644
--- a/lib/tdb/common/traverse.c
+++ b/lib/tdb/common/traverse.c
@@ -208,18 +208,43 @@ out:
return count;
}
+struct traverse_read_lite_state {
+ struct tdb_context *tdb;
+ tdb_traverse_func fn;
+ void *private_data;
+ size_t key_length;
+};
+
+static int traverse_read_lite_parser(TDB_DATA key, TDB_DATA data,
+ void *private_data)
+{
+ struct traverse_read_lite_state *state =
+ (struct traverse_read_lite_state *)private_data;
+
+ key.dptr = data.dptr;
+ key.dsize = state->key_length;
+ data.dptr += state->key_length;
+ data.dsize -= state->key_length;
+
+ return state->fn(state->tdb, key, data, state->private_data);
+}
+
/* We could do write, but we'd have to stash rec.next and fixup on delete */
static int do_traverse_read_lite(struct tdb_context *tdb,
tdb_traverse_func fn, void *private_data)
{
+ struct traverse_read_lite_state state;
int count = 0;
uint32_t i;
+ state.tdb = tdb;
+ state.fn = fn;
+ state.private_data = private_data;
+
for (i = 0, tdb->methods->next_hash_chain(tdb, &i);
i < tdb->hash_size;
i++, tdb->methods->next_hash_chain(tdb, &i)) {
tdb_off_t rec_ptr;
- TDB_DATA key, dbuf;
struct tdb_record rec;
if (tdb_lock(tdb, i, F_RDLCK) != 0) {
@@ -237,25 +262,23 @@ static int do_traverse_read_lite(struct tdb_context *tdb,
rec_ptr = rec.next;
continue;
}
- /* now read the full record */
- key.dptr = tdb_alloc_read(tdb,
- rec_ptr + sizeof(rec),
- rec.key_len + rec.data_len);
- if (!key.dptr) {
- goto unlock_fail;
+
+ count++;
+
+ if (fn == NULL) {
+ rec_ptr = rec.next;
+ continue;
}
- key.dsize = rec.key_len;
- dbuf.dptr = key.dptr + rec.key_len;
- dbuf.dsize = rec.data_len;
+ state.key_length = rec.key_len;
- count++;
- if (fn && fn(tdb, key, dbuf, private_data)) {
- SAFE_FREE(key.dptr);
+ if (tdb_parse_data(
+ tdb, tdb_null, rec_ptr + sizeof(rec),
+ rec.key_len + rec.data_len,
+ traverse_read_lite_parser, &state)) {
tdb_unlock(tdb, i, F_RDLCK);
goto out;
}
- SAFE_FREE(key.dptr);
rec_ptr = rec.next;
}
if (tdb_unlock(tdb, i, F_RDLCK) != 0) {
--
1.7.3.4
More information about the samba-technical
mailing list