smbd crash in a CTDB cluster

Fri Aug 11 17:40:12 UTC 2017

+samba-technical

On 2017-08-11 22:44, anoopcs at autistici.org wrote:
> Hi all,
> 
> In a 4 node Samba(v4.6.3) CTDB cluster(with 4 public IPs), smbd
> crashes were seen with the following back trace:
> 
> Core was generated by `/usr/sbin/smbd'.
> Program terminated with signal 6, Aborted.
> #0  0x00007f1d26d4a1f7 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007f1d26d4a1f7 in raise () from /lib64/libc.so.6
> #1  0x00007f1d26d4b8e8 in abort () from /lib64/libc.so.6
> #2  0x00007f1d286d04de in dump_core () at ../source3/lib/dumpcore.c:338
> #3  0x00007f1d286c16e7 in smb_panic_s3 (why=<optimized out>) at
> ../source3/lib/util.c:814
> #4  0x00007f1d2a79c95f in smb_panic (why=why at entry=0x7f1d2a7e482a
> "internal error") at ../lib/util/fault.c:166
> #5  0x00007f1d2a79cb76 in fault_report (sig=<optimized out>) at
> ../lib/util/fault.c:83
> #6  sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94
> #7  <signal handler called>
> #8  messaging_ctdbd_reinit (msg_ctx=msg_ctx at entry=0x56508d0e3800,
> mem_ctx=mem_ctx at entry=0x56508d0e3800, backend=0x0)
>     at ../source3/lib/messages_ctdbd.c:278
> #9  0x00007f1d286ccd40 in messaging_reinit
> (msg_ctx=msg_ctx at entry=0x56508d0e3800) at
> ../source3/lib/messages.c:415
> #10 0x00007f1d286c0ec9 in reinit_after_fork (msg_ctx=0x56508d0e3800,
> ev_ctx=<optimized out>,
>     parent_longlived=parent_longlived at entry=true,
> comment=comment at entry=0x0) at ../source3/lib/util.c:475
> #11 0x00007f1d286dbafa in background_job_waited
> (subreq=0x56508d0ec8e0) at ../source3/lib/background.c:179
> #12 0x00007f1d270e1c97 in tevent_common_loop_timer_delay
> (ev=0x56508d0e2d10) at ../tevent_timed.c:369
> #13 0x00007f1d270e2f49 in epoll_event_loop (tvalp=0x7fffa1f7ca70,
> epoll_ev=0x56508d0e2f90) at ../tevent_epoll.c:659
> #14 epoll_event_loop_once (ev=<optimized out>, location=<optimized
> out>) at ../tevent_epoll.c:930
> #15 0x00007f1d270e12a7 in std_event_loop_once (ev=0x56508d0e2d10,
> location=0x56508bde85d9 "../source3/smbd/server.c:1384")
>     at ../tevent_standard.c:114
> #16 0x00007f1d270dd0cd in _tevent_loop_once 
> (ev=ev at entry=0x56508d0e2d10,
>     location=location at entry=0x56508bde85d9
> "../source3/smbd/server.c:1384") at ../tevent.c:721
> #17 0x00007f1d270dd2fb in tevent_common_loop_wait (ev=0x56508d0e2d10,
> location=0x56508bde85d9 "../source3/smbd/server.c:1384")
>     at ../tevent.c:844
> #18 0x00007f1d270e1247 in std_event_loop_wait (ev=0x56508d0e2d10,
> location=0x56508bde85d9 "../source3/smbd/server.c:1384")
>     at ../tevent_standard.c:145
> #19 0x000056508bddfa95 in smbd_parent_loop (parent=<optimized out>,
> ev_ctx=0x56508d0e2d10) at ../source3/smbd/server.c:1384
> #20 main (argc=<optimized out>, argv=<optimized out>) at
> ../source3/smbd/server.c:2038
> 
> (gdb) f 8
> #8  messaging_ctdbd_reinit (msg_ctx=msg_ctx at entry=0x56508d0e3800,
> mem_ctx=mem_ctx at entry=0x56508d0e3800, backend=0x0)
>     at ../source3/lib/messages_ctdbd.c:278
> 278		struct messaging_ctdbd_context *ctx = talloc_get_type_abort(
> 
> (gdb) l
> 273
> 274	int messaging_ctdbd_reinit(struct messaging_context *msg_ctx,
> 275				   TALLOC_CTX *mem_ctx,
> 276				   struct messaging_backend *backend)
> 277	{
> 278		struct messaging_ctdbd_context *ctx = talloc_get_type_abort(
> 279			backend->private_data, struct messaging_ctdbd_context);
> 280		int ret;
> 281
> 282		ret = messaging_ctdbd_init_internal(msg_ctx, mem_ctx, ctx, true);
> 
> (gdb) p backend
> $1 = (struct messaging_backend *) 0x0
> 
> (gdb) p *msg_ctx
> $1 = {id = {pid = 17264, task_id = 0, vnn = 4294967295, unique_id =
> 4569628117635137227}, event_ctx = 0x56508d0e2d10,
>   callbacks = 0x56508d0fa250, new_waiters = 0x0, num_new_waiters = 0,
> waiters = 0x0, num_waiters = 0, msg_dgm_ref = 0x56508d0e6ac0,
>   remote = 0x0, names_db = 0x56508d0e3cf0}
> 
> Since core files were observed later it is hard to recollect the
> scenario which could have caused smbd to panic and dump the core.
> Please find corresponding logs attached to this mail(log level is
> default. not very helpful). Is there any chance by which
> msg_ctx->remote can be NULL in this code path? Also the value for vnn
> also looks strange..
> 
> Anoop C S
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log.smbd
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20170811/f37c0bcc/log.ksh>