[CTDB] strange fork bomb bug

Mathieu Parent math.parent at gmail.com
Fri Mar 7 04:23:42 MST 2014


Hello ctdb devs,

We had a strange ctdb behavior recently on an 8-nodes cluster: each
node had 192 ctdbd processes (instead of the usual 2), using 1024 or
so file descriptors each (which is the default linux limit)! Mostly
:pipe and :socket. It was hard to connect via SSH then, and even the
process table looked corrupted. Solution was to stop then kill ctdbd.

It seems that when the ctdbd child is blocked, the parent create a new
one without cleaning the older, untill hiting resource limits.

This was on an old version (Debian 1.12+git20120201-4), I wonder if
that has been fixed since.

I'm trying to reproduce the issue (with an hard NFS mount).

Regards
-- 
Mathieu Parent


NB : log.ctdb had:

2014/02/28 17:20:57.489370 [ 3486]: Freeze priority 1
2014/02/28 17:20:57.489984 [ 3486]: Freeze priority 2
2014/02/28 17:20:57.490491 [ 3486]: Freeze priority 3
2014/02/28 17:20:57.772767 [ 3486]: Thawing priority 1
2014/02/28 17:20:57.772808 [ 3486]: Release freeze handler for prio 1
2014/02/28 17:20:57.772831 [ 3486]: Thawing priority 2
2014/02/28 17:20:57.772844 [ 3486]: Release freeze handler for prio 2
2014/02/28 17:20:57.772863 [ 3486]: Thawing priority 3
2014/02/28 17:20:57.772874 [ 3486]: Release freeze handler for prio 3
2014/02/28 17:21:07.955309 [ 3486]: Freeze priority 1
2014/02/28 17:21:07.956174 [ 3486]: Freeze priority 2
2014/02/28 17:21:07.956808 [ 3486]: Freeze priority 3
2014/02/28 17:21:08.222463 [ 3486]: Thawing priority 1
2014/02/28 17:21:08.222525 [ 3486]: Release freeze handler for prio 1
2014/02/28 17:21:08.222561 [ 3486]: Thawing priority 2
2014/02/28 17:21:08.222573 [ 3486]: Release freeze handler for prio 2
2014/02/28 17:21:08.222591 [ 3486]: Thawing priority 3
2014/02/28 17:21:08.222602 [ 3486]: Release freeze handler for prio 3
2014/02/28 17:21:18.406226 [ 3486]: Freeze priority 1
2014/02/28 17:21:18.407113 [ 3486]: Freeze priority 2
2014/02/28 17:21:18.407711 [ 3486]: Freeze priority 3
2014/02/28 17:21:18.675376 [ 3486]: Thawing priority 1
2014/02/28 17:21:18.675427 [ 3486]: Release freeze handler for prio 1
2014/02/28 17:21:18.675450 [ 3486]: Thawing priority 2
2014/02/28 17:21:18.675462 [ 3486]: Release freeze handler for prio 2
2014/02/28 17:21:18.675480 [ 3486]: Thawing priority 3
2014/02/28 17:21:18.675490 [ 3486]: Release freeze handler for prio 3
2014/02/28 17:21:28.858118 [ 3486]: Freeze priority 1
2014/02/28 17:21:28.859022 [ 3486]: Freeze priority 2
2014/02/28 17:21:28.859641 [ 3486]: Freeze priority 3
2014/02/28 17:21:29.121186 [ 3486]: Thawing priority 1
2014/02/28 17:21:29.121239 [ 3486]: Release freeze handler for prio 1
2014/02/28 17:21:29.121262 [ 3486]: Thawing priority 2
2014/02/28 17:21:29.121274 [ 3486]: Release freeze handler for prio 2
2014/02/28 17:21:29.121292 [ 3486]: Thawing priority 3
[etc.]

and:
2014/02/28 17:33:51.254462 [ 3486]: Monitoring event was cancelled
2014/02/28 17:33:51.254515 [ 3486]: server/eventscript.c:584 Sending
SIGTERM to child pid:29991
2014/02/28 17:46:24.428171 [ 3486]: Monitoring event was cancelled
2014/02/28 17:46:24.428239 [ 3486]: server/eventscript.c:584 Sending
SIGTERM to child pid:27755
2014/02/28 18:05:13.859866 [ 3486]: Monitoring event was cancelled
2014/02/28 18:05:13.859921 [ 3486]: server/eventscript.c:584 Sending
SIGTERM to child pid:8818
2014/02/28 18:23:43.043934 [ 3486]: Monitoring event was cancelled
2014/02/28 18:23:43.043994 [ 3486]: server/eventscript.c:584 Sending
SIGTERM to child pid:20176
[etc.]


More information about the samba-technical mailing list