ctdb oops in ipalloc_can_host_ips

Martin Schwenke martin at meltin.net
Tue Feb 14 23:47:06 UTC 2017


Hi Steve,

On Tue, 14 Feb 2017 16:02:41 -0600, Steve French <smfrench at gmail.com>
wrote:

> Setup: 3 Node ctdb cluster, Samba 4.5-test (from a few days ago)
> 1. this setup had about 600+ million files.
> 2. node2 was primary and during some of the failover from
> node2->node1->node2, we noticed core dump in ctdb on node2 node.
> 
> 
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/sbin/ctdbd --pidfile=/run/ctdb/ctdbd.pid -d ERR'.
> Program terminated with signal 6, Aborted.
> #0  0x00007f82fa9191d7 in raise () from /lib64/libc.so.6
> #0  0x00007f82fa9191d7 in raise () from /lib64/libc.so.6
> #1  0x00007f82fa91a8c8 in abort () from /lib64/libc.so.6
> #2  0x00007f82fc3220cc in smb_panic () from /lib64/libsamba-util.so.0
> #3  0x00007f82fc322286 in sig_fault () from /lib64/libsamba-util.so.0
> #4  <signal handler called>
> #5  0x00007f82fca20b7e in ipalloc_can_host_ips ()
> #6  0x00007f82fc9fac74 in ctdb_takeover_run ()
> #7  0x00007f82fc9cc4b1 in do_takeover_run ()
> #8  0x00007f82fc9cd5bf in do_recovery.isra.19 ()
> #9  0x00007f82fc9de137 in ctdb_start_recoverd ()
> #10 0x00007f82fc9d5e6d in ctdb_setup_event_callback ()
> #11 0x00007f82fc9f4d67 in event_script_destructor ()
> #12 0x00007f82fb8d9e83 in _tc_free_internal () from /usr/lib64/samba/libtalloc.so.2
> #13 0x00007f82fb6d1c6b in epoll_event_loop_once () from /usr/lib64/samba/libtevent.so.0
> #14 0x00007f82fb6d0137 in std_event_loop_once () from /usr/lib64/samba/libtevent.so.0
> #15 0x00007f82fb6cc38d in _tevent_loop_once () from /usr/lib64/samba/libtevent.so.0
> #16 0x00007f82fb6cc52b in tevent_common_loop_wait () from /usr/lib64/samba/libtevent.so.0
> #17 0x00007f82fb6d00d7 in std_event_loop_wait () from /usr/lib64/samba/libtevent.so.0
> #18 0x00007f82fc9d6fbb in ctdb_start_daemon ()
> #19 0x00007f82fc9ce66a in main ()

I can't remember seeing this myself and can't easily see how it could
happen.  :-(

In 4.6, this code now runs in a separate executable helper.  We have a
stack of test cases for the helper and I've run them under valgrind a
lot.

That said, there's obviously a bug...  ;-)

Is there anything weird about the public IP address configuration?

Are you able to dig a little deeper and find the dodgy pointer being
referenced in ipalloc_can_host_ips()?

Thanks...

peace & happiness,
martin



More information about the samba-technical mailing list