[TDB] Patches for file and memory usage growth issues

Stefan (metze) Metzmacher metze at samba.org
Mon Apr 11 06:46:44 MDT 2011


Hi Simo,

>>  > The first one should be uncontroversial, the original intent makes sense
>>  > in a tdb database when you use small records, but when a rare huge
>>  > record is saved in, growing the db by 100 times its size really is
>>  > overkill. I also reduced the minimum overhead from 25% to 10% as with
>>  > large TDBs (in the order of 500MB/1GB, growing by 25% is quite a lot).
>>
>> The change from 25% to 10% will have the bigger impact of the two
>> changes I think. The 25% number was always fairly arbitrary, but
>> changing to 10% means 2.5 times as many tdb_expand calls while a
>> database grows. Have you measured any impact of this in initially
>> creating a large database?
> 
> The only impact I measured was the size, and it was a quite impressive
> gain. But I haven't tested if it made a difference in speed.

maybe we can use something like this:

uint64_t calc_new_size(uint64_t old_size, uint64_t add_size)
{
	uint64_t needed_size = old_size + add_size;
	uint64_t new_size1, new_size2, new_size;

#define MEGA_BYTE (1024 * 1024)

	/* expand by 100MB */
	max_size1 = needed_size + (100 * MEGA_BYTE)
	/* expand by 25% */
	max_size2 = needed_size * 125 / 100;

	/* use the minimum */
	max_size = MIN(max_size1, max_size2);

	/* align to 1MB */
	max_size = (max_size + (MEGA_BYTE -1)) / MEGA_BYTE;
	
	return max_size;
}

>> I think the real problem is the inefficient index format in ldb.
> 
> Oh, that's totally a huge issue, but I am trying to work on 2 fronts
> here. I need something to cut down on memory usage quickly, in order to
> solve problems for current users, w/o making incompatible changes. Then
> I need to account for efficiency. Of course if both can be achieved
> quickly that's even better.
> 
>> We really need to fix that. The compression will makes the file smaller,
>> but much slower. Then we'll need a compression cache to make it fast
>> again, and we'll quickly end up with something that is very hard to
>> maintain. 
> 
> A cache would mean keeping huge records in memory, which would cause
> memory to grow again too much I think, unless we use some LRU and keep
> the cache size tightly controlled, but that would indeed be expensive.
> 
> One thing I would do is to save a key/dn pair with key not bigger than a
> long integer, and then use this integer in indexes instead of the full
> DNs. Whether we can make this transparent and efficient I don't know
> yet, but it would certainly reduce the size of large indexes by more
> than an order of magnitude.

Tridge and I discussed something like that the AD plugfest last year.

If I remember correctly we discussed using the objectGUID as primary fixed
size key.

And a maybe even better solution using uint64_t values which can be used
as direct offset into the tdb mmap area.

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20110411/52cf2b38/attachment.pgp>


More information about the samba-technical mailing list