[TDB] Patches for file and memory usage growth issues
Rusty Russell
rusty at samba.org
Tue Apr 19 05:34:13 MDT 2011
On Mon, 18 Apr 2011 09:32:35 -0400, simo <idra at samba.org> wrote:
> On Mon, 2011-04-18 at 22:20 +0930, Rusty Russell wrote:
> > FYI, here are the times & file sizes for ./growtdb-bench 100000 1 on my
> > laptop (I cut the test down so it would run in reasonable time):
> >
> > Baseline: 13m53 472M
> > Intelligent repack: 11m6 451M
> > Limited expand: 7m48 117M
> > Repack in place: 7m44 113M
>
> Can you check what is the maximum memory footprint while the test runs
> on each of these tests ?
I meant to ask you: how did you measure that? I had a ps running every
second, but it usually missed the peak (which happens right at the end
of the repack).
> The bug (found elswhere in the end :) that made me initially dive into
> these size issues was a large increase in memory footprint.
> With the old method we were basically doing 2 huge mmaps that would
> cause the process to use up to 4g of virtual memory (3.5g RES, 2.2g
> SHR). I would hope that by not using an auxiliary, in memory database,
> there is a way to substantially reduce that.
>
> > As a whim, I put TDB2 to the same test:
> > TDB2 0m9 297M
>
> Now, this is awesome! Very fast.
> What is it that is making such a big difference in time used ?
Several things. The good first:
(1) The hash scales as we get bigger, so 100000 records is pretty easy.
I usually test with 1 to 5 million records.
(2) We only overallocate records once they actually grow, but then we
overallocate by 50%, meaning fewer reallocs for the index.
The bad:
(3) TDB doesn't repack. We'll probably need to eventually, since
fragmentation can be an issue in any allocator.
(4) I use tdb_append in my benchmark, and that's a simple "read, copy
write" inside tdb1, but tdb2 just writes if it has room. But LDB
doesn't use tdb_append anyway, so it's cheating :)
> > Simo, this is what I'm thinking of pushing, once it's tested. OK?
>
> Sounds ok repack wise.
> Do you mind if I also push the patch to fix tdbbackup to not use a
> transaction on the copied db ?
> That one also made quite a difference for the performance and final size
> of the tdb generated. And because the final tdb is a backup file it
> means lower times to copy it wherever it needs to be copied so really
> worthwhile imo.
Yes, please do. I wanted to make sure we'd nailed the growth problems
first before you made them go away :)
I've pushed it to the autobuilder now.
Thanks!
Rusty.
More information about the samba-technical
mailing list