Patch for a ctdb reloadnodes crash Bugzilla ID 10366

Kevin Osborn kosborn at overlandstorage.com
Tue Jan 21 16:37:13 MST 2014


Hi,

We have been seeing a lot of CTDB crashes when testing adding and removing nodes from our cluster. The details are described in
https://bugzilla.samba.org/show_bug.cgi?id=10366

The good news is that we believe that have found a fix and we would like to make sure the community knows about it. The problem seems to go back at least as far as CTDB 1.1.

Here is our proposed fix.  Apply the following patch to the 2.5.1 ctdb source code:

--- ctdb-2.5.1/server/ctdb_takeover.c.orig      2014-01-16 09:24:59.000000000-0800
+++ ctdb-2.5.1/server/ctdb_takeover.c   2014-01-16 09:26:13.000000000 -0800 @@ -3051,11 +3051,9 @@

        /* If this is the first tickle */
        if (tcparray == NULL) {
-               tcparray = talloc_size(ctdb->nodes,
-                       offsetof(struct ctdb_tcp_array, connections) +
-                       sizeof(struct ctdb_tcp_connection) * 1);
+               tcparray = talloc(ctdb->nodes, struct ctdb_tcp_array);
                CTDB_NO_MEMORY(ctdb, tcparray);
-               vnn->tcp_array = tcparray;
+               vnn->tcp_array = talloc_steal(vnn, tcparray);

                tcparray->num = 0;
                tcparray->connections = talloc_size(tcparray, sizeof(struct ctdb_tcp_connection));

It seems like the original code is making ctdb->nodes the owner of the memory talloc'd for the tcparray. When reload nodes is called as part of the add nodes process, the old ctdb->nodes list is eventually talloc_free()'d, which then frees the tcparray that it "owns". Then later on, the tcp_array pointer is referenced but the memory has been recycled and the values inside are no longer valid. This is when the crash occurs.

Please let me know if the patch suggested is good, or if you have any other suggestions for how this problem should be addressed.

I hope this helps!

Thanks,

-Kevin Osborn


[Description: cid:_1_0897838008977E8C0054777B882577B9]<http://www.overlandstorage.com/>

[Description: cid:_1_0897927408978E740054777B882577B9]

Kevin Osborn, Director Software Development
125 South Market Street San Jose, CA 95113 USA
t 408.283.4717 | f 858.571.3664
kosborn at overlandstorage.com<mailto:kosborn at overlandstorage.com> | www.overlandstorage.com<http://www.overlandstorage.com/>



-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 2380 bytes
Desc: image001.gif
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20140121/de4146c3/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 359 bytes
Desc: image002.gif
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20140121/de4146c3/attachment-0001.gif>


More information about the samba-technical mailing list