[TDB] Patches for file and memory usage growth issues

Rusty Russell rusty at samba.org
Mon Apr 18 01:23:00 MDT 2011


On Thu, 14 Apr 2011 14:14:32 -0400, simo <idra at samba.org> wrote:
> On Wed, 2011-04-13 at 14:46 +0930, Rusty Russell wrote:
> Hi Rusty,
> unfortunately this still doesn't seem to really help.
> 
> I've slightly modified my tests so I've rerun 3 tests:
> 1. plain
> 2. with your first 3 patches
> 3. with all 4 patches
> 
> plain is the baseline and it includes my patches to use better
> heuristics as published in my git tree.
> 
> With your patches I actually see both an increase in time spent and size
> of the memory footprint as well as final size of the tdb
> unfortunately.
>
>
> The strict repack looks certainly overkill with thess tests, although
> the basic repack patches do not hit too hard on time spent although the
> final tdb is still 300MiB larger than without the patch.

Yes, let's ignore that overagressive repack as a completely bad idea.

And this clearly reveals YA bug in my repack code:

> PLAIN tdb:
...
>         Size of file/data: 1048993792/665083314
>         Number of records: 76628
>         Smallest/average/largest keys: 12/48/65
>         Smallest/average/largest data: 43/8631/1289140
>         Smallest/average/largest padding: 20/1283/322353

Vs:

> Rusty's TDB REPACK:
...
>         Size of file/data: 1378013184/820552100
>         Number of records: 89944
>         Smallest/average/largest keys: 12/50/65
>         Smallest/average/largest data: 43/9072/1289140
>         Smallest/average/largest padding: 9/1847/322353

I have reproduced it, and am now tracking it down...

My test program is below for reference, it very roughly approximates
your test at a TDB level, I believe.

Cheers,
Rusty.
PS. Oh, an dropping _PUBLIC_ was not deliberate.  Will fix!

#include <ccan/tdb/tdb.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <err.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char *argv[])
{
	unsigned int i, j, users, groups;
	TDB_DATA idxkey, idxdata;
	TDB_DATA k, d, gk;
	char cmd[100];
	struct tdb_context *tdb;

	if (argc != 3) {
		printf("Usage: growtdb-bench <users> <groups>\n");
		exit(1);
	}
	users = atoi(argv[1]);
	groups = atoi(argv[2]);

	sprintf(cmd, "cat /proc/%i/statm", getpid());

	tdb = tdb_open("/tmp/growtdb.tdb", 10000, TDB_DEFAULT,
		       O_RDWR|O_CREAT|O_TRUNC, 0600);

	idxkey.dptr = (unsigned char *)"User index";
	idxkey.dsize = strlen("User index");
	idxdata.dsize = 51;
	idxdata.dptr = calloc(idxdata.dsize, 1);

	/* Create users. */
	k.dsize = 48;
	k.dptr = calloc(k.dsize, 1);
	d.dsize = 64;
	d.dptr = calloc(d.dsize, 1);

	tdb_transaction_start(tdb);
	for (i = 0; i < users; i++) {
		memcpy(k.dptr, &i, sizeof(i));
		if (tdb_store(tdb, k, d, TDB_INSERT) != 0)
			errx(1, "tdb insert failed: %s", tdb_errorstr(tdb));

		/* This simulates a growing index record. */
		if (tdb_append(tdb, idxkey, idxdata) != 0)
			errx(1, "tdb append failed: %s", tdb_errorstr(tdb));
	}
	if (tdb_transaction_commit(tdb) != 0)
		errx(1, "tdb commit1 failed: %s", tdb_errorstr(tdb));

	system(cmd);

	/* Now put them all in groups: add 32 bytes to each record for
	 * a group. */
	gk.dsize = 48;
	gk.dptr = calloc(k.dsize, 1);
	gk.dptr[gk.dsize-1] = 1;

	d.dsize = 32;
	for (i = 0; i < groups; i++) {
		tdb_transaction_start(tdb);
		/* Create the "group". */
		memcpy(gk.dptr, &i, sizeof(i));
		if (tdb_store(tdb, gk, d, TDB_INSERT) != 0)
			errx(1, "tdb insert failed: %s", tdb_errorstr(tdb));

		/* Now populate it. */
		for (j = 0; j < users; j++) {
			/* Append to the user. */
			memcpy(k.dptr, &j, sizeof(j));
			if (tdb_append(tdb, k, d) != 0)
				errx(1, "tdb append failed: %s",
				     tdb_errorstr(tdb));
			
			/* Append to the group. */
			if (tdb_append(tdb, gk, d) != 0)
				errx(1, "tdb append failed: %s",
				     tdb_errorstr(tdb));
		}
		if (tdb_transaction_commit(tdb) != 0)
			errx(1, "tdb commit2 failed: %s", tdb_errorstr(tdb));
		system(cmd);
	}

	return 0;
}


More information about the samba-technical mailing list