smbd crash in a CTDB cluster

Richard Sharpe realrichardsharpe at gmail.com
Sat Aug 12 03:35:12 UTC 2017


On Fri, Aug 11, 2017 at 7:28 PM,  <anoopcs at autistici.org> wrote:
> On 2017-08-11 22:47, Richard Sharpe wrote:
>>
>> On Fri, Aug 11, 2017 at 10:40 AM, Anoop C S via samba-technical
>> <samba-technical at lists.samba.org> wrote:
>>>
>>> +samba-technical
>>>
>>> On 2017-08-11 22:44, anoopcs at autistici.org wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> In a 4 node Samba(v4.6.3) CTDB cluster(with 4 public IPs), smbd
>>>> crashes were seen with the following back trace:
>>>>
>>>> Core was generated by `/usr/sbin/smbd'.
>>>> Program terminated with signal 6, Aborted.
>>>> #0  0x00007f1d26d4a1f7 in raise () from /lib64/libc.so.6
>>>> (gdb) bt
>>>> #0  0x00007f1d26d4a1f7 in raise () from /lib64/libc.so.6
>>>> #1  0x00007f1d26d4b8e8 in abort () from /lib64/libc.so.6
>>>> #2  0x00007f1d286d04de in dump_core () at ../source3/lib/dumpcore.c:338
>>>> #3  0x00007f1d286c16e7 in smb_panic_s3 (why=<optimized out>) at
>>>> ../source3/lib/util.c:814
>>>> #4  0x00007f1d2a79c95f in smb_panic (why=why at entry=0x7f1d2a7e482a
>>>> "internal error") at ../lib/util/fault.c:166
>>>> #5  0x00007f1d2a79cb76 in fault_report (sig=<optimized out>) at
>>>> ../lib/util/fault.c:83
>>>> #6  sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94
>>>> #7  <signal handler called>
>>>> #8  messaging_ctdbd_reinit (msg_ctx=msg_ctx at entry=0x56508d0e3800,
>>>> mem_ctx=mem_ctx at entry=0x56508d0e3800, backend=0x0)
>>>>     at ../source3/lib/messages_ctdbd.c:278
>>>> #9  0x00007f1d286ccd40 in messaging_reinit
>>>> (msg_ctx=msg_ctx at entry=0x56508d0e3800) at
>>>> ../source3/lib/messages.c:415
>>>> #10 0x00007f1d286c0ec9 in reinit_after_fork (msg_ctx=0x56508d0e3800,
>>>> ev_ctx=<optimized out>,
>>>>     parent_longlived=parent_longlived at entry=true,
>>>> comment=comment at entry=0x0) at ../source3/lib/util.c:475
>>>> #11 0x00007f1d286dbafa in background_job_waited
>>>> (subreq=0x56508d0ec8e0) at ../source3/lib/background.c:179
>>>> #12 0x00007f1d270e1c97 in tevent_common_loop_timer_delay
>>>> (ev=0x56508d0e2d10) at ../tevent_timed.c:369
>>>> #13 0x00007f1d270e2f49 in epoll_event_loop (tvalp=0x7fffa1f7ca70,
>>>> epoll_ev=0x56508d0e2f90) at ../tevent_epoll.c:659
>>>> #14 epoll_event_loop_once (ev=<optimized out>, location=<optimized
>>>> out>) at ../tevent_epoll.c:930
>>>> #15 0x00007f1d270e12a7 in std_event_loop_once (ev=0x56508d0e2d10,
>>>> location=0x56508bde85d9 "../source3/smbd/server.c:1384")
>>>>     at ../tevent_standard.c:114
>>>> #16 0x00007f1d270dd0cd in _tevent_loop_once (ev=ev at entry=0x56508d0e2d10,
>>>>     location=location at entry=0x56508bde85d9
>>>> "../source3/smbd/server.c:1384") at ../tevent.c:721
>>>> #17 0x00007f1d270dd2fb in tevent_common_loop_wait (ev=0x56508d0e2d10,
>>>> location=0x56508bde85d9 "../source3/smbd/server.c:1384")
>>>>     at ../tevent.c:844
>>>> #18 0x00007f1d270e1247 in std_event_loop_wait (ev=0x56508d0e2d10,
>>>> location=0x56508bde85d9 "../source3/smbd/server.c:1384")
>>>>     at ../tevent_standard.c:145
>>>> #19 0x000056508bddfa95 in smbd_parent_loop (parent=<optimized out>,
>>>> ev_ctx=0x56508d0e2d10) at ../source3/smbd/server.c:1384
>>>> #20 main (argc=<optimized out>, argv=<optimized out>) at
>>>> ../source3/smbd/server.c:2038
>>
>>
>> This is quite normal if the node was banned when the smbd was forked.
>> What does the ctdb log show? What was happening at that time?
>
>
> I think the logs got rotated and got cleaned up subsequently over time. I
> could barely remember that the cluster was not in healthy state at initial
> stage due to some network issue. In fact I am not sure whether node/nodes
> were in BANNED state or not. I will try to dig that up and confirm your
> analysis.
>
> Does that mean it is a deliberate panic from smbd? I am asking this because
> of the code re-factoring done in this area which introduces
> talloc_get_type_abort() from 4.5 onwards.

Samba will panic in reinit_after_fork if anything fails. There are
very important things going on there, including connecting to ctdb.

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)



More information about the samba-technical mailing list