[CTDB] strange fork bomb bug

Sun Mar 23 14:23:59 MDT 2014

Hi Mathieu,

On Fri, 7 Mar 2014 12:23:42 +0100
Mathieu Parent <math.parent at gmail.com> wrote:

> We had a strange ctdb behavior recently on an 8-nodes cluster: each
> node had 192 ctdbd processes (instead of the usual 2), using 1024 or
> so file descriptors each (which is the default linux limit)! Mostly
> :pipe and :socket. It was hard to connect via SSH then, and even the
> process table looked corrupted. Solution was to stop then kill ctdbd.
> 
> It seems that when the ctdbd child is blocked, the parent create a new
> one without cleaning the older, untill hiting resource limits.

Are the processes all waiting on record locks? I'd suggest looking at
/proc/locks, and also checking the lockwait metrics displayed with
"ctdb statistics". We've run into similar issues under record lock
contention.

> This was on an old version (Debian 1.12+git20120201-4), I wonder if
> that has been fixed since.

If the processes are all lockwait forks, then consider merging
"ctdb_lockwait: create overflow queue" and "LockWait congestion" if
you don't have them already.

Cheers, David