Patch for a ctdb reloadnodes crash Bugzilla ID 10366
Amitay Isaacs
amitay at gmail.com
Wed Jan 22 02:35:31 MST 2014
On Wed, Jan 22, 2014 at 10:37 AM, Kevin Osborn
<kosborn at overlandstorage.com>wrote:
> Hi,
>
> We have been seeing a lot of CTDB crashes when testing adding and removing
> nodes from our cluster. The details are described in
> https://bugzilla.samba.org/show_bug.cgi?id=10366
>
> The good news is that we believe that have found a fix and we would like
> to make sure the community knows about it. The problem seems to go back at
> least as far as CTDB 1.1.
>
> Here is our proposed fix. Apply the following patch to the 2.5.1 ctdb
> source code:
>
> --- ctdb-2.5.1/server/ctdb_takeover.c.orig 2014-01-16
> 09:24:59.000000000-0800
> +++ ctdb-2.5.1/server/ctdb_takeover.c 2014-01-16 09:26:13.000000000
> -0800 @@ -3051,11 +3051,9 @@
>
> /* If this is the first tickle */
> if (tcparray == NULL) {
> - tcparray = talloc_size(ctdb->nodes,
> - offsetof(struct ctdb_tcp_array, connections) +
> - sizeof(struct ctdb_tcp_connection) * 1);
> + tcparray = talloc(ctdb->nodes, struct ctdb_tcp_array);
> CTDB_NO_MEMORY(ctdb, tcparray);
> - vnn->tcp_array = tcparray;
> + vnn->tcp_array = talloc_steal(vnn, tcparray);
>
> tcparray->num = 0;
> tcparray->connections = talloc_size(tcparray,
> sizeof(struct ctdb_tcp_connection));
>
> It seems like the original code is making ctdb->nodes the owner of the
> memory talloc'd for the tcparray. When reload nodes is called as part of
> the add nodes process, the old ctdb->nodes list is eventually
> talloc_free()'d, which then frees the tcparray that it "owns". Then later
> on, the tcp_array pointer is referenced but the memory has been recycled
> and the values inside are no longer valid. This is when the crash occurs.
>
> Please let me know if the patch suggested is good, or if you have any
> other suggestions for how this problem should be addressed.
>
> I hope this helps!
>
> Thanks,
>
> -Kevin Osborn
>
Hi Kevin,
Nice catch. There are two places where tcparray is allocated off
ctdb->nodes. Here's the patch which fixes both places and removes
unnecessary talloc_steal(). Please let me know if it works.
Thanks.
Amitay.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-daemon-Always-talloc-tickle-array-off-vnn-instead-of.patch
Type: text/x-patch
Size: 2525 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20140122/4edb652c/attachment.bin>
More information about the samba-technical
mailing list