[TDB] Patches for file and memory usage growth issues
simo
idra at samba.org
Tue Apr 19 05:41:38 MDT 2011
On Tue, 2011-04-19 at 21:04 +0930, Rusty Russell wrote:
> On Mon, 18 Apr 2011 09:32:35 -0400, simo <idra at samba.org> wrote:
> > On Mon, 2011-04-18 at 22:20 +0930, Rusty Russell wrote:
> > > FYI, here are the times & file sizes for ./growtdb-bench 100000 1 on my
> > > laptop (I cut the test down so it would run in reasonable time):
> > >
> > > Baseline: 13m53 472M
> > > Intelligent repack: 11m6 451M
> > > Limited expand: 7m48 117M
> > > Repack in place: 7m44 113M
> >
> > Can you check what is the maximum memory footprint while the test runs
> > on each of these tests ?
>
> I meant to ask you: how did you measure that? I had a ps running every
> second, but it usually missed the peak (which happens right at the end
> of the repack).
I use top | grep program name
I may also have missed some peak, but I was interested in the whole
memory behavior, and even if missed some peak in general I got a decent
idea. Plus I did a new transaction for every group I was adding so I
have at least 250 peaks, can't miss them all :)
> > The bug (found elswhere in the end :) that made me initially dive into
> > these size issues was a large increase in memory footprint.
> > With the old method we were basically doing 2 huge mmaps that would
> > cause the process to use up to 4g of virtual memory (3.5g RES, 2.2g
> > SHR). I would hope that by not using an auxiliary, in memory database,
> > there is a way to substantially reduce that.
> >
> > > As a whim, I put TDB2 to the same test:
> > > TDB2 0m9 297M
> >
> > Now, this is awesome! Very fast.
> > What is it that is making such a big difference in time used ?
>
> Several things. The good first:
>
> (1) The hash scales as we get bigger, so 100000 records is pretty easy.
> I usually test with 1 to 5 million records.
> (2) We only overallocate records once they actually grow, but then we
> overallocate by 50%, meaning fewer reallocs for the index.
Ok, this makes sense. I guess the growing hash really helps.
> The bad:
>
> (3) TDB doesn't repack. We'll probably need to eventually, since
> fragmentation can be an issue in any allocator.
Yes.
> (4) I use tdb_append in my benchmark, and that's a simple "read, copy
> write" inside tdb1, but tdb2 just writes if it has room. But LDB
> doesn't use tdb_append anyway, so it's cheating :)
Maybe we should try to use it :)
> > > Simo, this is what I'm thinking of pushing, once it's tested. OK?
> >
> > Sounds ok repack wise.
> > Do you mind if I also push the patch to fix tdbbackup to not use a
> > transaction on the copied db ?
>
> > That one also made quite a difference for the performance and final size
> > of the tdb generated. And because the final tdb is a backup file it
> > means lower times to copy it wherever it needs to be copied so really
> > worthwhile imo.
>
> Yes, please do. I wanted to make sure we'd nailed the growth problems
> first before you made them go away :)
>
> I've pushed it to the autobuilder now.
Thank you.
Simo.
--
Simo Sorce
Samba Team GPL Compliance Officer <simo at samba.org>
Principal Software Engineer at Red Hat, Inc. <simo at redhat.com>
More information about the samba-technical
mailing list