[Samba] Maximum monitor timeout count 20 reached. Making node unhealthy
Jeremy Allison
jra at samba.org
Thu Apr 8 04:55:39 UTC 2021
On Tue, Apr 06, 2021 at 08:31:33PM -0700, Isaac Stone via samba wrote:
>Running clustered samba + ctdb, pushing our new system from dev to prod and
>ran into this issue. Never saw in dev and staging in six months of testing,
>no idea what it means
>
>We are running a cluster of only one node while we transfer the production
>data to the new system, so the box complaining is the only box that exists
>as far as ctdb knows (the only entry in the nodes file is itself)
>
>Was down for an hour and a half today repeating every ~45 seconds
>
>"Maximum monitor timeout count 20 reached. Making node unhealthy"
The message comes from here ctdb/server/ctdb_monitor.c
/*
called when a health monitoring event script finishes
*/
static void ctdb_health_callback(struct ctdb_context *ctdb, int status, void *p)
{
struct ctdb_node *node = ctdb->nodes[ctdb->pnn];
TDB_DATA data;
struct ctdb_node_flag_change c;
uint32_t next_interval;
int ret;
TDB_DATA rddata;
struct ctdb_srvid_message rd;
const char *state_str = NULL;
c.pnn = ctdb->pnn;
c.old_flags = node->flags;
ZERO_STRUCT(rd);
rd.pnn = ctdb->pnn;
rd.srvid = 0;
rddata.dptr = (uint8_t *)&rd;
rddata.dsize = sizeof(rd);
if (status == ECANCELED) {
DEBUG(DEBUG_ERR,("Monitoring event was cancelled\n"));
goto after_change_status;
}
if (status == ETIMEDOUT) {
ctdb->monitor->event_script_timeouts++;
if (ctdb->monitor->event_script_timeouts >=
ctdb->tunable.monitor_timeout_count) {
DEBUG(DEBUG_ERR,
("Maximum monitor timeout count %u reached."
" Making node unhealthy\n",
So it has run a health monitoring script, and it has
timed out (ETIMEDOUT_) more than 20 times.
The script is invoked here:
/*
see if the event scripts think we are healthy
*/
static void ctdb_check_health(struct tevent_context *ev,
struct tevent_timer *te,
struct timeval t, void *private_data)
....
ret = ctdb_event_script_callback(ctdb,
ctdb->monitor->monitor_context,
ctdb_health_callback,
ctdb, CTDB_EVENT_MONITOR, "%s", "");
from here ctdb/server/eventscript.c:
/*
run the event script in the background, calling the callback when
finished. If mem_ctx is freed, callback will never be called.
*/
int ctdb_event_script_callback(struct ctdb_context *ctdb,
TALLOC_CTX *mem_ctx,
void (*callback)(struct ctdb_context *, int, void *),
void *private_data,
enum ctdb_event call,
const char *fmt, ...)
{
va_list ap;
int ret;
va_start(ap, fmt);
ret = ctdb_event_script_run(ctdb, mem_ctx, callback, private_data,
call, fmt, ap);
va_end(ap);
return ret;
}
so I'd start looking at the monitoring scripts. From the ctdb manpage:
scriptstatus
This command displays which event scripts where run in the previous monitoring cycle and the result of each script. If a script failed with an error, causing the node to become unhealthy, the output from that script is also shown.
This command is deprecated. It's provided for backward compatibility. In place of ctdb scriptstatus, use ctdb event status.
Example
# ctdb scriptstatus
00.ctdb OK 0.011 Sat Dec 17 19:40:46 2016
01.reclock OK 0.010 Sat Dec 17 19:40:46 2016
05.system OK 0.030 Sat Dec 17 19:40:46 2016
06.nfs OK 0.014 Sat Dec 17 19:40:46 2016
10.interface OK 0.041 Sat Dec 17 19:40:46 2016
11.natgw OK 0.008 Sat Dec 17 19:40:46 2016
11.routing OK 0.007 Sat Dec 17 19:40:46 2016
13.per_ip_routing OK 0.007 Sat Dec 17 19:40:46 2016
20.multipathd OK 0.007 Sat Dec 17 19:40:46 2016
31.clamd OK 0.007 Sat Dec 17 19:40:46 2016
40.vsftpd OK 0.013 Sat Dec 17 19:40:46 2016
41.httpd OK 0.015 Sat Dec 17 19:40:46 2016
49.winbind OK 0.022 Sat Dec 17 19:40:46 2016
50.samba ERROR 0.077 Sat Dec 17 19:40:46 2016
OUTPUT: ERROR: samba tcp port 445 is not responding
I'm not a ctdb expert, but I hope this helps !
More information about the samba
mailing list