ctdb: Adding memory pool for queue callback
Swen Schillig
swen at vnet.ibm.com
Wed Nov 7 16:28:04 UTC 2018
On Wed, 2018-11-07 at 17:05 +0100, Volker Lendecke wrote:
> On Wed, Nov 07, 2018 at 05:03:42PM +0100, Swen Schillig wrote:
> > On Wed, 2018-11-07 at 16:48 +0100, Volker Lendecke wrote:
> > > On Wed, Nov 07, 2018 at 04:30:58PM +0100, Swen Schillig wrote:
> > > > Oh, didn't answer your last question....
> > > > yes, it does match the CTDB use pattern where we fetch a pool
> > > > and
> > > > then
> > > > re-use that memory for as long as we use that queue/connection.
> > >
> > > Right, but the devil might be in the details: Do we have a
> > > hierarchy
> > > hanging off "data" allocated from the pool or not? And does this
> > > make
> > > a difference for overall performance?
> > >
> >
> > No.
> > The memory is used as the chunk as it was received and if anything
> > special needs to be done, the memory is getting copied into a new
> > structure and the old (pool-) memory is free'd.
>
> And what is the talloc hierarchy there? What does that hang off? Can
> you point me at the code that does that?
CTDB's queue_process is triggering the respective callback routine
which is either one of
ctdb_client_read_cb
ctdb_daemon_read_cb
ctdb_tcp_read_cb
ctdb_tcp_tnode_cb
...and the first thing those callbacks do is creating their own mem
context. I hope that's what you meant with the talloc hierarchy.
Besides, I extended my little test-prog with some of your non-opt
scenarios.
Resulting to code like this
t = clock();
for (i = 0; i < 100000000; i++) {
ts = talloc(NULL, struct test_struct);
ts2 = talloc(ts, struct test_struct);
ts3 = talloc(ts, struct test_struct);
ts4 = talloc(ts2, struct test_struct);
talloc_free(ts);
}
t = clock() - t;
tt = ((double) t) / CLOCKS_PER_SEC;
printf("It took %f seconds to execute 10 million talloc/free cycles.\n", tt);
t = clock();
for (i = 0; i < 100000000; i++) {
ts = talloc(pool_ctx, struct test_struct);
ts2 = talloc(ts, struct test_struct);
ts3 = talloc(ts, struct test_struct);
ts4 = talloc(ts2, struct test_struct);
talloc_free(ts);
}
t = clock() - t;
tt = ((double) t) / CLOCKS_PER_SEC;
printf("It took %f seconds to execute 10 million talloc(pool)/free cycles.\n", tt);
t = clock();
for (i = 0; i < 100000000; i++) {
ts = malloc(sizeof(struct test_struct));
ts2 = malloc(sizeof(struct test_struct));
ts3 = malloc(sizeof(struct test_struct));
ts4 = malloc(sizeof(struct test_struct));
free(ts);
free(ts2);
free(ts3);
free(ts4);
}
t = clock() - t;
tt = ((double) t) / CLOCKS_PER_SEC;
printf("It took %f seconds to execute 10 million malloc/free cycles.\n", tt);
showing values like
[swen at linux ~]$ ./a.out
It took 8.963634 seconds to execute 10 million talloc/free cycles.
It took 5.951885 seconds to execute 10 million talloc(pool)/free cycles.
It took 4.095244 seconds to execute 10 million malloc/free cycles.
So, I think there's still enough progress to justify the change.
Cheers Swen
More information about the samba-technical
mailing list