[TDB] Patches for file and memory usage growth issues

Mon Apr 11 09:39:54 MDT 2011

On Mon, 11 Apr 2011 08:02:11 -0400, simo <idra at samba.org> wrote:
> On Mon, 2011-04-11 at 20:29 +0930, Rusty Russell wrote:
> > On Sat, 09 Apr 2011 23:01:15 -0400, simo <idra at samba.org> wrote:
> > > Hi all,
> > > during this week I have been working on a memory growth issue affecting
> > > sssd when storing large group of users in the LDB based cache.
> > > 
> > > It turns out that a lot of memory is being used by TDB.
> > > 
> > > Please take a look at the last 3 patches in this tree:
> > > http://git.samba.org/?p=idra/samba.git;a=shortlog;h=refs/heads/tdb-memsize
> > > 
> > > The first one should be uncontroversial, the original intent makes sense
> > > in a tdb database when you use small records, but when a rare huge
> > > record is saved in, growing the db by 100 times its size really is
> > > overkill. I also reduced the minimum overhead from 25% to 10% as with
> > > large TDBs (in the order of 500MB/1GB, growing by 25% is quite a lot).
> > 
> > Yep, that makes sense.  Perhaps just drop the heuristic to ignore record
> > size altogether?  Double until the database is > 1M, then add 10%?
> 
> Can't. You may be unlucky enough that the record you are trying to store
> is much bigger than the entire database :)
> Although perhaps I can make the DB grow faster when it is smaller, say
> min 25% up to 100MiB and then switch to 10% (or maybe even less).

Right, so something like:

        if (tdb->map_size < 100 * 1024 * 1024)
                additional = tdb->map_size / 4;
        else
                additional = tdb->map_size / 10;

        if (additional < size)
                additional = size;

> > Note that we repack on expand, so limiting expansions will end up
> > slowing down some bechmarks.  I don't think it's a problem IRL, though.
> 
> Yeah most of the issues may need much smarter heuristics or speed/size
> compromises. TDB was originally used for very small datasets and so it
> was a good idea to gain anything you can in speed even if it meant
> wasting some space relative to the size of the set. But now that TDB is
> used also for large datasets this is hurting.

Now, I just figured out your backup problem.  Sorry for being slow.

If you take and empty database and start a transaction then dump records
into it, you will do a repack when the transaction finishes, which is
probably not a good idea.

Far better to only repack if we've expanded the DB *once* in the
transaction: this reflects the fragmentation case better than the
current heuristic.

Something like this? (Completely untested, it's 1:30 am here)

diff --git a/lib/tdb/common/transaction.c b/lib/tdb/common/transaction.c
index b6cf771..164b0d7 100644
--- a/lib/tdb/common/transaction.c
+++ b/lib/tdb/common/transaction.c
@@ -403,8 +403,6 @@ static int transaction_expand_file(struct tdb_context *tdb, tdb_off_t size,
 		return -1;
 	}
 
-	tdb->transaction->need_repack = true;
-
 	return 0;
 }
 
@@ -979,6 +977,12 @@ static int _tdb_transaction_prepare_commit(struct tdb_context *tdb)
 			_tdb_transaction_cancel(tdb);
 			return -1;
 		}
+
+		/* We repack if we're slowly growing, eg. not for backup! */
+		if (tdb->map_size < tdb->transaction->old_map_size * 2) {
+			tdb->transaction->need_repack = true;
+		}
+
 		tdb->map_size = tdb->transaction->old_map_size;
 		methods->tdb_oob(tdb, tdb->map_size + 1, 1);
 	}

> 
> > > The second one needs discussion. I noticed that a transaction can
> > > substantially increase the size of a tdb, in my tests I had a 1.2G size
> > > TDB actually grow to a ~1.7G after a backup ... by not using a
> > > transaction to store the backup copy the same 1.2G TDB shrunk down to
> > > ~900MB. I know that the reason we use a transaction is to avoid rsyncs
> > > of partial DBs, so maybe we want to add an option on whether to use a
> > > transaction for the 2nd db ? Comments welcome.
> > 
> > Hmm, that's odd.  Compressing the recovery area makes a great deal of
> > sense for tdb2 though.
> > 
> > Note that we overallocate all TDB entries by 25%.  (TDB2 only does that
> > when you expand an existing record).
> 
> The overallocation is useful when you save many entries at the same
> time, which is actually a frequent use case for sssd, how would tdb2
> behave on burst of growth (many new records) ?

I think you misunderstand?  I'm saying we allocate 25% extra room in
each record, to reduce fragmentation in case it grows.

I've got some regressions with TDB2 in S3, but once I've fixed those
I'd really like to see what your stress tests gives with tdb2...

> > > The third one I think is less controversial but still not something I
> > > would put in without discussion. This feature has the very nice property
> > > of being completely optional and needs an explicit flags to be passed at
> > > tdb_open(). This makes it relatively harmless and not very intrusive.
> > 
> > But it makes an incompatible database :(
> > 
> > OTOH it's fairly easy for the user to do this themselves and they know
> > what tradeoffs they want to make.
> 
> This is why you have to explicitly enable it. Old tools will report
> errors indeed, but that seems like a very desirable outcome if you do
> not have a clue on how to handle new data types :)

I like the idea, but I think it's best to use that compression for the
recovery area, not normal data.  That might actually speed things up
(less data to sync)...

Thanks!
Rusty.