[TDB] Patches for file and memory usage growth issues

tridge at samba.org tridge at samba.org
Mon Apr 11 04:50:33 MDT 2011


Hi Simo,

 > during this week I have been working on a memory growth issue affecting
 > sssd when storing large group of users in the LDB based cache.

Can you give us an idea of the number of records involved, the number
of indexed attributes and how big the records are?

 > The first one should be uncontroversial, the original intent makes sense
 > in a tdb database when you use small records, but when a rare huge
 > record is saved in, growing the db by 100 times its size really is
 > overkill. I also reduced the minimum overhead from 25% to 10% as with
 > large TDBs (in the order of 500MB/1GB, growing by 25% is quite a lot).

The change from 25% to 10% will have the bigger impact of the two
changes I think. The 25% number was always fairly arbitrary, but
changing to 10% means 2.5 times as many tdb_expand calls while a
database grows. Have you measured any impact of this in initially
creating a large database?

 > The second one needs discussion. I noticed that a transaction can
 > substantially increase the size of a tdb, in my tests I had a 1.2G size
 > TDB actually grow to a ~1.7G after a backup ... by not using a
 > transaction to store the backup copy the same 1.2G TDB shrunk down to
 > ~900MB. I know that the reason we use a transaction is to avoid rsyncs
 > of partial DBs, so maybe we want to add an option on whether to use a
 > transaction for the 2nd db ? Comments welcome.

I initially thought you were removing the transaction on the old db,
but I see that you're only removing it on the new db. That should be
OK, but we will need to at least get a write lock across the whole of
tdb_new, and we should probably also do a fsync at the end on tdb_fd()
before the close and rename.

Rusty, what do you think?

 > For example an ldb index gets easily compressed down to less than 10%
 > the original size, for obvious reasons, although compressing an index is
 > not necessarily a too bright idea, yet ...

I think the real problem is the inefficient index format in ldb. We
really need to fix that. The compression will makes the file smaller,
but much slower. Then we'll need a compression cache to make it fast
again, and we'll quickly end up with something that is very hard to
maintain. 

 > Finally I also am going to try to come to a solution for the huge memory
 > usage caused by tdb_repack(). This function actually needs a lot of
 > improvement. Firstly because it uses a transaction to copy the database
 > twice do the repack.

tdb_repack() was really added for ctdb, and it suited the sizes of
database and available memory for that application. As you have
noticed, it is terrible for repacking a large persistent database.

Given that tdb 2.0 is just around the corner, perhaps you should work
with Rusty to test to see if tdb_repack() is actually needed for sssd
with the new tdb 2.0 format?

Cheers, Tridge


More information about the samba-technical mailing list