Patch for a ctdb reloadnodes crash Bugzilla ID 10366

Amitay Isaacs amitay at gmail.com
Wed Jan 22 02:35:31 MST 2014


On Wed, Jan 22, 2014 at 10:37 AM, Kevin Osborn
<kosborn at overlandstorage.com>wrote:

> Hi,
>
> We have been seeing a lot of CTDB crashes when testing adding and removing
> nodes from our cluster. The details are described in
> https://bugzilla.samba.org/show_bug.cgi?id=10366
>
> The good news is that we believe that have found a fix and we would like
> to make sure the community knows about it. The problem seems to go back at
> least as far as CTDB 1.1.
>
> Here is our proposed fix.  Apply the following patch to the 2.5.1 ctdb
> source code:
>
> --- ctdb-2.5.1/server/ctdb_takeover.c.orig      2014-01-16
> 09:24:59.000000000-0800
> +++ ctdb-2.5.1/server/ctdb_takeover.c   2014-01-16 09:26:13.000000000
> -0800 @@ -3051,11 +3051,9 @@
>
>         /* If this is the first tickle */
>         if (tcparray == NULL) {
> -               tcparray = talloc_size(ctdb->nodes,
> -                       offsetof(struct ctdb_tcp_array, connections) +
> -                       sizeof(struct ctdb_tcp_connection) * 1);
> +               tcparray = talloc(ctdb->nodes, struct ctdb_tcp_array);
>                 CTDB_NO_MEMORY(ctdb, tcparray);
> -               vnn->tcp_array = tcparray;
> +               vnn->tcp_array = talloc_steal(vnn, tcparray);
>
>                 tcparray->num = 0;
>                 tcparray->connections = talloc_size(tcparray,
> sizeof(struct ctdb_tcp_connection));
>
> It seems like the original code is making ctdb->nodes the owner of the
> memory talloc'd for the tcparray. When reload nodes is called as part of
> the add nodes process, the old ctdb->nodes list is eventually
> talloc_free()'d, which then frees the tcparray that it "owns". Then later
> on, the tcp_array pointer is referenced but the memory has been recycled
> and the values inside are no longer valid. This is when the crash occurs.
>
> Please let me know if the patch suggested is good, or if you have any
> other suggestions for how this problem should be addressed.
>
> I hope this helps!
>
> Thanks,
>
> -Kevin Osborn
>

Hi Kevin,

Nice catch. There are two places where tcparray is allocated off
ctdb->nodes. Here's the patch which fixes both places and removes
unnecessary talloc_steal().  Please let me know if it works.

Thanks.
Amitay.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-daemon-Always-talloc-tickle-array-off-vnn-instead-of.patch
Type: text/x-patch
Size: 2525 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20140122/4edb652c/attachment.bin>


More information about the samba-technical mailing list