ctdb: Adding memory pool for queue callback

Wed Nov 7 16:28:04 UTC 2018

On Wed, 2018-11-07 at 17:05 +0100, Volker Lendecke wrote:
> On Wed, Nov 07, 2018 at 05:03:42PM +0100, Swen Schillig wrote:
> > On Wed, 2018-11-07 at 16:48 +0100, Volker Lendecke wrote:
> > > On Wed, Nov 07, 2018 at 04:30:58PM +0100, Swen Schillig wrote:
> > > > Oh, didn't answer your last question....
> > > > yes, it does match the CTDB use pattern where we fetch a pool
> > > > and
> > > > then
> > > > re-use that memory for as long as we use that queue/connection.
> > > 
> > > Right, but the devil might be in the details: Do we have a
> > > hierarchy
> > > hanging off "data" allocated from the pool or not? And does this
> > > make
> > > a difference for overall performance?
> > > 
> > 
> > No.
> > The memory is used as the chunk as it was received and if anything
> > special needs to be done, the memory is getting copied into a new
> > structure and the old (pool-) memory is free'd.
> 
> And what is the talloc hierarchy there? What does that hang off? Can
> you point me at the code that does that?

CTDB's queue_process is triggering the respective callback routine
which is either one of
	ctdb_client_read_cb

	ctdb_daemon_read_cb

	ctdb_tcp_read_cb

	ctdb_tcp_tnode_cb

...and the first thing those callbacks do is creating their own mem
context. I hope that's what you meant with the talloc hierarchy.

Besides, I extended my little test-prog with some of your non-opt
scenarios.
Resulting to code like this

	t = clock();
	for (i = 0; i < 100000000; i++) {
		ts = talloc(NULL, struct test_struct);
		ts2 = talloc(ts, struct test_struct);
		ts3 = talloc(ts, struct test_struct);
		ts4 = talloc(ts2, struct test_struct);
		talloc_free(ts);
	}
	t = clock() - t;
	tt = ((double) t) / CLOCKS_PER_SEC;
	printf("It took %f seconds to execute 10 million talloc/free cycles.\n", tt);

	t = clock();
	for (i = 0; i < 100000000; i++) {
		ts = talloc(pool_ctx, struct test_struct);
		ts2 = talloc(ts, struct test_struct);
		ts3 = talloc(ts, struct test_struct);
		ts4 = talloc(ts2, struct test_struct);
		talloc_free(ts);
	}
	t = clock() - t;
	tt = ((double) t) / CLOCKS_PER_SEC;
	printf("It took %f seconds to execute 10 million talloc(pool)/free cycles.\n", tt);

	t = clock();
	for (i = 0; i < 100000000; i++) {
		ts = malloc(sizeof(struct test_struct));
		ts2 = malloc(sizeof(struct test_struct));
		ts3 = malloc(sizeof(struct test_struct));
		ts4 = malloc(sizeof(struct test_struct));
		free(ts);
		free(ts2);
		free(ts3);
		free(ts4);
	}
	t = clock() - t;
	tt = ((double) t) / CLOCKS_PER_SEC;
	printf("It took %f seconds to execute 10 million malloc/free cycles.\n", tt);

showing values like 
[swen at linux ~]$ ./a.out 
It took 8.963634 seconds to execute 10 million talloc/free cycles.
It took 5.951885 seconds to execute 10 million talloc(pool)/free cycles.
It took 4.095244 seconds to execute 10 million malloc/free cycles.

So, I think there's still enough progress to justify the change.

Cheers Swen