[Samba] ctdb vacuum timeouts and record locks

Mon Nov 6 01:15:56 UTC 2017

On Thu, 2 Nov 2017 11:17:27 -0700, Computerisms Corporation via samba
<samba at lists.samba.org> wrote:

> This occurred again this morning, when the user reported the problem, I 
> found in the ctdb logs that vacuuming has been going on since last 
> night.  The need to fix it was urgent (when isn't it?) so I didn't have 
> time to poke around for clues, but immediately restarted the lxc 
> container.  But this time it wouldn't restart, which I had time to trace 
> to a hung smbd process, and between that and a run of the debug_locks.sh 
> script, I traced it to the user reporting the problem.  Given that the 
> user was primarily having problems with files in a given folder, I am 
> thinking this is because of some kind of lock on a file within that 
> folder.
> 
> Ended up rebooting both physical machines, problem solved.  for now.
> 
> So, not sure how to determine if this is a gluster problem, an lxc 
> problem, or a ctdb/smbd problem.  Thoughts/suggestions are welcome...

You need a stack trace of the stuck smbd process.  If it is wedged in a
system call on the cluster filesystem then you can blame the cluster
filesystem.  debug_locks.sh is meant to be able to get you the relevant
stack trace via gstack.  In fact, even before you get the stack trace
you could check a process listing to see if the process is stuck in D
state.

gstack basically does:

  gdb -batch -ex "thread apply all bt" -p <pid>

For a single-threaded process it leaves out "thread apply all".
However, in recent GDB I'm not sure it makes a difference... seems to
work for me on Linux.

Note that gstack/gdb will hang when run against a process in D state.

peace & happiness,
martin