ctdb oops in ipalloc_can_host_ips

Steve French smfrench at gmail.com
Tue Feb 14 22:02:41 UTC 2017


Setup: 3 Node ctdb cluster, Samba 4.5-test (from a few days ago)
1. this setup had about 600+ million files.
2. node2 was primary and during some of the failover from
node2->node1->node2, we noticed core dump in ctdb on node2 node.


[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/ctdbd --pidfile=/run/ctdb/ctdbd.pid -d ERR'.
Program terminated with signal 6, Aborted.
#0  0x00007f82fa9191d7 in raise () from /lib64/libc.so.6
#0  0x00007f82fa9191d7 in raise () from /lib64/libc.so.6
#1  0x00007f82fa91a8c8 in abort () from /lib64/libc.so.6
#2  0x00007f82fc3220cc in smb_panic () from /lib64/libsamba-util.so.0
#3  0x00007f82fc322286 in sig_fault () from /lib64/libsamba-util.so.0
#4  <signal handler called>
#5  0x00007f82fca20b7e in ipalloc_can_host_ips ()
#6  0x00007f82fc9fac74 in ctdb_takeover_run ()
#7  0x00007f82fc9cc4b1 in do_takeover_run ()
#8  0x00007f82fc9cd5bf in do_recovery.isra.19 ()
#9  0x00007f82fc9de137 in ctdb_start_recoverd ()
#10 0x00007f82fc9d5e6d in ctdb_setup_event_callback ()
#11 0x00007f82fc9f4d67 in event_script_destructor ()
#12 0x00007f82fb8d9e83 in _tc_free_internal () from
/usr/lib64/samba/libtalloc.so.2
#13 0x00007f82fb6d1c6b in epoll_event_loop_once () from
/usr/lib64/samba/libtevent.so.0
#14 0x00007f82fb6d0137 in std_event_loop_once () from
/usr/lib64/samba/libtevent.so.0
#15 0x00007f82fb6cc38d in _tevent_loop_once () from
/usr/lib64/samba/libtevent.so.0
#16 0x00007f82fb6cc52b in tevent_common_loop_wait () from
/usr/lib64/samba/libtevent.so.0
#17 0x00007f82fb6d00d7 in std_event_loop_wait () from
/usr/lib64/samba/libtevent.so.0
#18 0x00007f82fc9d6fbb in ctdb_start_daemon ()
#19 0x00007f82fc9ce66a in main ()


ctdb log

2017/02/13 21:42:04.345670 [ 3988]: Thaw db: smbXsrv_open_global.tdb
generation 255426666
2017/02/13 21:42:04.345685 [ 3988]: Release freeze handle for db
smbXsrv_open_global.tdb
2017/02/13 21:42:04.345998 [recoverd: 4468]: recovery: 17 of 17
databases recovered
2017/02/13 21:42:04.346273 [recoverd: 4468]: recovery: set recovery
mode to NORMAL
2017/02/13 21:42:04.576067 [recoverd: 4468]: recovery: recovered event finished
2017/02/13 21:42:04.577589 [recoverd: 4468]:
===============================================================
2017/02/13 21:42:04.577637 [recoverd: 4468]: INTERNAL ERROR: Signal 11
in pid 4468 (4.5.6)
2017/02/13 21:42:04.577649 [recoverd: 4468]: Please read the
Trouble-Shooting section of the Samba HOWTO
2017/02/13 21:42:04.577658 [recoverd: 4468]:
===============================================================
2017/02/13 21:42:04.577667 [recoverd: 4468]: PANIC: internal error
2017/02/13 21:42:04.584440 [ 3988]: ../ctdb/server/ctdb_fork.c:123
waitpid() returned error. errno:10
2017/02/13 21:42:08.705937 [ 3988]: ../ctdb/server/ctdb_fork.c:123
waitpid() returned error. errno:10
2017/02/13 21:42:10.806124 [ 3988]: ../ctdb/server/ctdb_fork.c:123
waitpid() returned error. errno:10
2017/02/13 21:42:11.314965 [ 3988]: ../ctdb/server/ctdb_fork.c:123
waitpid() returned error. errno:10
2017/02/13 21:42:11.364546 [ 3988]: ../ctdb/server/ctdb_fork.c:123
waitpid() returned error. errno:10
2017/02/13 21:42:11.788818 [ 3988]: ../ctdb/server/ctdb_fork.c:123
waitpid() returned error. errno:10
2017/02/13 21:42:11.876901 [ 3988]: ../ctdb/server/ctdb_fork.c:123
waitpid() returned error. errno:10
2017/02/13 21:42:12.356185 [ 3988]: ../ctdb/server/ctdb_fork.c:123
waitpid() returned error. errno:10


-- 
Thanks,

Steve



More information about the samba-technical mailing list