[Samba] Debugging oplock.c core dumps

Sobey, Richard A r.sobey at imperial.ac.uk
Thu Mar 10 10:12:03 UTC 2016


Hi everyone

I have a four node samba/CTDB cluster exporting CIFS shares from a GPFS 3.5 cluster. Samba/CTDB is at 4.2.3:

[root at hostname ~]# rpm -qa | grep sernet
sernet-samba-ad-4.2.3-18.el6.x86_64
sernet-samba-libs-4.2.3-18.el6.x86_64
sernet-samba-common-4.2.3-18.el6.x86_64
sernet-samba-4.2.3-18.el6.x86_64
sernet-samba-ctdb-4.2.3-18.el6.x86_64
sernet-samba-client-4.2.3-18.el6.x86_64
sernet-samba-libwbclient-devel-4.2.3-18.el6.x86_64
sernet-build-key-1.1-4.noarch
sernet-samba-libsmbclient0-4.2.3-18.el6.x86_64
sernet-samba-winbind-4.2.3-18.el6.x86_64

..whilst GPFS is at version 3.5.0.22.

On every server in the CTDB cluster we are seeing the /var/log/samba/cores/smbd folder filling up core.xxxxx files. As of writing this email, I can see 20-30 dump files being written every second (a few minutes later it's now calmed down).

We have samba logging on level 1 at the moment. An output from one of the core dumps is as follows:

[2016/03/10 10:05:58.334288,  0] ../source3/lib/dumpcore.c:318(dump_core)
  dumping core in /var/log/samba/cores/smbd
[2016/03/10 10:05:58.620827,  0] ../source3/smbd/oplock.c:192(update_num_read_oplocks)
  PANIC: assert failed at ../source3/smbd/oplock.c(192): d->num_share_modes == 1
[2016/03/10 10:05:58.620905,  0] ../source3/lib/util.c:788(smb_panic_s3)
  PANIC (pid 25524): assert failed: d->num_share_modes == 1
[2016/03/10 10:05:58.621578,  0] ../source3/lib/util.c:899(log_stack_trace)
  BACKTRACE: 27 stack frames:
   #0 /usr/lib64/samba/libsmbconf.so.0(log_stack_trace+0x1c) [0x7fb908724c41]
   #1 /usr/lib64/samba/libsmbconf.so.0(smb_panic_s3+0x55) [0x7fb908724d43]
   #2 /usr/lib64/samba/libsamba-util.so.0(smb_panic+0x35) [0x7fb90a98239e]
   #3 /usr/lib64/samba/libsmbd-base-samba4.so(update_num_read_oplocks+0x9a) [0x7fb90a5af09e]
   #4 /usr/lib64/samba/libsmbd-base-samba4.so(+0x1058bd) [0x7fb90a5598bd]
   #5 /usr/lib64/samba/libsmbd-base-samba4.so(+0x1072e8) [0x7fb90a55b2e8]
   #6 /usr/lib64/samba/libsmbd-base-samba4.so(create_file_default+0x28b) [0x7fb90a55bf67]
   #7 /usr/lib64/samba/libsmbd-base-samba4.so(+0x1d9d5b) [0x7fb90a62dd5b]
   #8 /usr/lib64/samba/libsmbd-base-samba4.so(smb_vfs_call_create_file+0xd4) [0x7fb90a561c07]
   #9 /usr/lib64/samba/libsmbd-base-samba4.so(smbd_smb2_request_process_create+0x2063) [0x7fb90a590d10]
   #10 /usr/lib64/samba/libsmbd-base-samba4.so(smbd_smb2_request_dispatch+0xb9b) [0x7fb90a5884cd]
   #11 /usr/lib64/samba/libsmbd-base-samba4.so(+0x135bd2) [0x7fb90a589bd2]
   #12 /usr/lib64/samba/libsmbconf.so.0(run_events_poll+0x2c2) [0x7fb90873a24f]
   #13 /usr/lib64/samba/libsmbconf.so.0(+0x37697) [0x7fb90873a697]
   #14 /usr/lib64/samba/libtevent.so.0(_tevent_loop_once+0x92) [0x7fb909c388e7]
   #15 /usr/lib64/samba/libtevent.so.0(tevent_common_loop_wait+0x17) [0x7fb909c38952]
   #16 /usr/lib64/samba/libtevent.so.0(_tevent_loop_wait+0xa) [0x7fb909c386eb]
   #17 /usr/lib64/samba/libsmbd-base-samba4.so(smbd_process+0x91a) [0x7fb90a575882]
   #18 /usr/sbin/smbd(+0x93d9) [0x7fb90afd93d9]
   #19 /usr/lib64/samba/libsmbconf.so.0(run_events_poll+0x2c2) [0x7fb90873a24f]
   #20 /usr/lib64/samba/libsmbconf.so.0(+0x37697) [0x7fb90873a697]
   #21 /usr/lib64/samba/libtevent.so.0(_tevent_loop_once+0x92) [0x7fb909c388e7]
   #22 /usr/lib64/samba/libtevent.so.0(tevent_common_loop_wait+0x17) [0x7fb909c38952]
   #23 /usr/lib64/samba/libtevent.so.0(_tevent_loop_wait+0xa) [0x7fb909c386eb]
   #24 /usr/sbin/smbd(main+0x1922) [0x7fb90afdb1d1]
   #25 /lib64/libc.so.6(__libc_start_main+0xfd) [0x7fb90722bd5d]
   #26 /usr/sbin/smbd(+0x5e09) [0x7fb90afd5e09]
[2016/03/10 10:05:58.622009,  0] ../source3/lib/dumpcore.c:318(dump_core)
  dumping core in /var/log/samba/cores/smbd

None of that output looks obvious to me as to where to start troubleshooting.

A bit of information about the user base: mix of Windows (biggest), Mac (smaller) and Linux (smallest) users all connecting via CIFS. Number of open files across the cluster can go up to 6000 or so. The number of shares exported is 6, with file/folder access in the filesystem controlled by ACLs.

I need to know how I can debug what's causing the dumps.

Please bear in mind I am not a developer or Linux minded (my background is Windows) but I'm getting by. If you ask me to recompile Samba with symbols for example, I'd say no :)

Any advice will be gratefully received.

Many thanks

Richard


More information about the samba mailing list