[Samba] CTDB Node unnecessarily banning other nodes

tim clusters tim.clusters at gmail.com
Fri Jul 31 09:57:33 MDT 2009


Hi,

We are using CTDB version 1.0.77 and yesterday we saw an instance of node
running into issues and banning itself to recover (as listed below):

node1:
2009/07/29 23:23:37.748251 [22371]: Banning node 0 for 300 seconds
2009/07/29 23:23:37.748263 [22371]: self ban - lowering our election
priority
2009/07/29 23:23:37.748503 [22275]: This node has been banned - forcing
freeze and recovery

Now other nodes part of CTDB cluster receives the ban message, but even
though the ID does not belong to its CURRENT ID, other nodes bans itself and
goes into recovery mode.  I guess this is not supposed to happen?

node2 (should not ban itself):
2009/07/29 23:23:37.748659 [19905]: Got a ban request for pnn:0 but our pnn
is 1. Ignoring ban request
2009/07/29 23:23:37.748994 [19776]: This node has been banned - forcing
freeze and recovery

node3 (should not ban itself):
2009/07/29 23:23:37.748506 [19892]: Got a ban request for pnn:0 but our pnn
is 2. Ignoring ban request
2009/07/29 23:23:37.749575 [19750]: This node has been banned - forcing
freeze and recovery

Existing Version 1.0.77: ctdb-1.0.77/ctdb_monitor.c

241         if ((node->flags & NODE_FLAGS_BANNED) && !(c->old_flags &
NODE_FLAGS    _BANNED)) {
242                 /* make sure we are frozen */
243                 DEBUG(DEBUG_NOTICE,("This node has been banned - forcing
fre    eze and recovery\n"));

--

I see a condition added in the "ban algorithm" in the latest 1.0.88  to
ensure the banned node ID matches with node's PNN ID ((node->pnn ==
ctdb->pnn))

--

Version 1.0.88:

311         /* if we have become banned, we should go into recovery mode */
312         if ((node->flags & NODE_FLAGS_BANNED) && !(c->old_flags &
NODE_FLAGS    _BANNED) && (node->pnn == ctdb->pnn)) {
313                 /* make sure we are frozen */
314                 DEBUG(DEBUG_NOTICE,("This node has been banned - forcing
fre    eze and recovery\n"));

Can you please confirm if upgrading to 1.0.88 would fix the issue of a node
getting banned does not cause banning of other nodes, unnecessarily?

Thanks,
-Tim


More information about the samba mailing list