[TDB] Patches for file and memory usage growth issues
Stefan (metze) Metzmacher
metze at samba.org
Mon Apr 11 06:46:44 MDT 2011
>> > The first one should be uncontroversial, the original intent makes sense
>> > in a tdb database when you use small records, but when a rare huge
>> > record is saved in, growing the db by 100 times its size really is
>> > overkill. I also reduced the minimum overhead from 25% to 10% as with
>> > large TDBs (in the order of 500MB/1GB, growing by 25% is quite a lot).
>> The change from 25% to 10% will have the bigger impact of the two
>> changes I think. The 25% number was always fairly arbitrary, but
>> changing to 10% means 2.5 times as many tdb_expand calls while a
>> database grows. Have you measured any impact of this in initially
>> creating a large database?
> The only impact I measured was the size, and it was a quite impressive
> gain. But I haven't tested if it made a difference in speed.
maybe we can use something like this:
uint64_t calc_new_size(uint64_t old_size, uint64_t add_size)
uint64_t needed_size = old_size + add_size;
uint64_t new_size1, new_size2, new_size;
#define MEGA_BYTE (1024 * 1024)
/* expand by 100MB */
max_size1 = needed_size + (100 * MEGA_BYTE)
/* expand by 25% */
max_size2 = needed_size * 125 / 100;
/* use the minimum */
max_size = MIN(max_size1, max_size2);
/* align to 1MB */
max_size = (max_size + (MEGA_BYTE -1)) / MEGA_BYTE;
>> I think the real problem is the inefficient index format in ldb.
> Oh, that's totally a huge issue, but I am trying to work on 2 fronts
> here. I need something to cut down on memory usage quickly, in order to
> solve problems for current users, w/o making incompatible changes. Then
> I need to account for efficiency. Of course if both can be achieved
> quickly that's even better.
>> We really need to fix that. The compression will makes the file smaller,
>> but much slower. Then we'll need a compression cache to make it fast
>> again, and we'll quickly end up with something that is very hard to
> A cache would mean keeping huge records in memory, which would cause
> memory to grow again too much I think, unless we use some LRU and keep
> the cache size tightly controlled, but that would indeed be expensive.
> One thing I would do is to save a key/dn pair with key not bigger than a
> long integer, and then use this integer in indexes instead of the full
> DNs. Whether we can make this transparent and efficient I don't know
> yet, but it would certainly reduce the size of large indexes by more
> than an order of magnitude.
Tridge and I discussed something like that the AD plugfest last year.
If I remember correctly we discussed using the objectGUID as primary fixed
And a maybe even better solution using uint64_t values which can be used
as direct offset into the tdb mmap area.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 262 bytes
Desc: OpenPGP digital signature
More information about the samba-technical