[TDB] Patches for file and memory usage growth issues

Mon Apr 11 06:20:04 MDT 2011

On Mon, 2011-04-11 at 20:50 +1000, tridge at samba.org wrote:
> Hi Simo,
> 
>  > during this week I have been working on a memory growth issue affecting
>  > sssd when storing large group of users in the LDB based cache.
> 
> Can you give us an idea of the number of records involved, the number
> of indexed attributes and how big the records are?

In this (synthetic) test we were creating 25000 sssd users (posix
attributes + memberof attribute generated by our memberof plugin + some
timestamps). Once the 25k users are created we create 250 groups each
containing the 25k users as members, and as a final step we modify a
group removing some meberships.

It is clearly a stress test, but we are doing that because we had
reprots of OOM kills in sssd caches of directories that had a few groups
with > 20k users as members.

We have indexes on objectlass and other attributes containing DNs.

>  > The first one should be uncontroversial, the original intent makes sense
>  > in a tdb database when you use small records, but when a rare huge
>  > record is saved in, growing the db by 100 times its size really is
>  > overkill. I also reduced the minimum overhead from 25% to 10% as with
>  > large TDBs (in the order of 500MB/1GB, growing by 25% is quite a lot).
> 
> The change from 25% to 10% will have the bigger impact of the two
> changes I think. The 25% number was always fairly arbitrary, but
> changing to 10% means 2.5 times as many tdb_expand calls while a
> database grows. Have you measured any impact of this in initially
> creating a large database?

The only impact I measured was the size, and it was a quite impressive
gain. But I haven't tested if it made a difference in speed.

>  > The second one needs discussion. I noticed that a transaction can
>  > substantially increase the size of a tdb, in my tests I had a 1.2G size
>  > TDB actually grow to a ~1.7G after a backup ... by not using a
>  > transaction to store the backup copy the same 1.2G TDB shrunk down to
>  > ~900MB. I know that the reason we use a transaction is to avoid rsyncs
>  > of partial DBs, so maybe we want to add an option on whether to use a
>  > transaction for the 2nd db ? Comments welcome.
> 
> I initially thought you were removing the transaction on the old db,
> but I see that you're only removing it on the new db. That should be
> OK, but we will need to at least get a write lock across the whole of
> tdb_new, and we should probably also do a fsync at the end on tdb_fd()
> before the close and rename.

Right, I thought about adding the lock (and forgot about the fsync), but
then decided to ask if it was ok to remove the transaction before doing
any real work. Thanks for pointing it out, if Rusty agrees, I'll proceed
to add those features.

> Rusty, what do you think?
> 
>  > For example an ldb index gets easily compressed down to less than 10%
>  > the original size, for obvious reasons, although compressing an index is
>  > not necessarily a too bright idea, yet ...
> 
> I think the real problem is the inefficient index format in ldb.

Oh, that's totally a huge issue, but I am trying to work on 2 fronts
here. I need something to cut down on memory usage quickly, in order to
solve problems for current users, w/o making incompatible changes. Then
I need to account for efficiency. Of course if both can be achieved
quickly that's even better.

> We really need to fix that. The compression will makes the file smaller,
> but much slower. Then we'll need a compression cache to make it fast
> again, and we'll quickly end up with something that is very hard to
> maintain. 

A cache would mean keeping huge records in memory, which would cause
memory to grow again too much I think, unless we use some LRU and keep
the cache size tightly controlled, but that would indeed be expensive.

One thing I would do is to save a key/dn pair with key not bigger than a
long integer, and then use this integer in indexes instead of the full
DNs. Whether we can make this transparent and efficient I don't know
yet, but it would certainly reduce the size of large indexes by more
than an order of magnitude.

>  > Finally I also am going to try to come to a solution for the huge memory
>  > usage caused by tdb_repack(). This function actually needs a lot of
>  > improvement. Firstly because it uses a transaction to copy the database
>  > twice do the repack.
> 
> tdb_repack() was really added for ctdb, and it suited the sizes of
> database and available memory for that application. As you have
> noticed, it is terrible for repacking a large persistent database.

Unfortunately I noticed that until the db is relatively small I got much
better memory usage, but once the DB is large and fragmented, removing
tdb_repack has the effect of making the memory used even larger.

> Given that tdb 2.0 is just around the corner, perhaps you should work
> with Rusty to test to see if tdb_repack() is actually needed for sssd
> with the new tdb 2.0 format?

I am trying to get "something" fast, even if it is inefficient. I may
have failed to mention that my current test ends up using almost 4GiB of
memory without any of these corrective measures. While with all the
patches applied, I got memory usage at peak down to 2.2G which is still
a lot, but much better than before. Certainly part of the waste is due
to our memberof plugin, which is something I am going to deal with next,
but 4G->2G just with tdb tweaks was impressive and made me realize how
inefficient tdb is for large DBs with some big records use cases and
that will certainly impact the samba AD database too.

Simo.

-- 
Simo Sorce
Samba Team GPL Compliance Officer <simo at samba.org>
Principal Software Engineer at Red Hat, Inc. <simo at redhat.com>