[PATCH] Restarting cleanupd when ctdb-messaging is down

Ralph Böhme slow at samba.org
Mon Jul 18 16:20:49 UTC 2016


On Mon, Jul 18, 2016 at 12:49:13PM +0200, Volker Lendecke wrote:
> Hi!
> 
> On Fri, Jul 15, 2016 at 10:34:27AM +0200, Ralph Boehme wrote:
>   
> > +static void cleanupd_init_timer(struct tevent_context *ev,
> > +				struct tevent_timer *te,
> > +				struct timeval current_time,
> > +				void *private_data);
> > +
> > +struct cleanup_init_state {
> > +	bool ok;
> > +	struct tevent_context *ev;
> > +	struct messaging_context *msg;
> > +	struct tevent_timer *te;
> > +	struct server_id *ppid;
> > +};
> > +
> > +static struct tevent_req *cleanupd_init_send(struct tevent_context *ev,
> > +					     TALLOC_CTX *mem_ctx,
> > +					     struct messaging_context *msg,
> > +					     struct server_id *ppid)
> > +{
> > +	struct tevent_req *req = NULL;
> > +	struct cleanup_init_state *state = NULL;
> > +	struct timeval tv;
> > +
> > +	req = tevent_req_create(mem_ctx, &state, struct cleanup_init_state);
> > +	if (req == NULL) {
> > +		return NULL;
> > +	}
> > +
> > +	*state = (struct cleanup_init_state) {
> > +		.msg = msg,
> > +		.ev = ev,
> > +		.ppid = ppid
> > +	};
> > +
> > +	tv = tevent_timeval_current_ofs(0, 0);
> > +
> > +	state->te = tevent_add_timer(ev, state, tv, cleanupd_init_timer, req);
> > +	if (tevent_req_nomem(state->te, req)) {
> > +		tevent_req_post(req, ev);
> > +	}
> 
> From my point of view tevent_wakeup_send/recv is more in line with the
> _send/_recv pattern we're using lately. I try to avoid dealing with
> direct events wherever possible. Any reason why you did not use it
> here?

Good idea, tevent_wakeup_send/recv is the right thing to use here.

> > +
> > +	return req;
> > +}
> > +
> > +static void cleanupd_init_timer(struct tevent_context *ev,
> > +				struct tevent_timer *te,
> > +				struct timeval current_time,
> > +				void *private_data)
> > +{
> > +	struct tevent_req *req = talloc_get_type_abort(
> > +		private_data, struct tevent_req);
> > +	struct cleanup_init_state *state = tevent_req_data(
> > +		req, struct cleanup_init_state);
> > +	struct timeval tv;
> > +
> > +	DBG_NOTICE("Initializing cleanupd from restart timer()\n");
> > +
> > +	state->ok = cleanupd_init(state->msg, false, state->ppid);
> > +	if (state->ok) {
> > +		DBG_WARNING("cleanupd restarted\n");
> > +		tevent_req_done(req);
> > +		return;
> > +	}
> > +
> > +	DBG_ERR("Scheduled cleanupd restart failed, rescheduling\n");
> > +
> > +	tv = tevent_timeval_current_ofs(1, 0);
> > +
> > +	state->te = tevent_add_timer(ev, state, tv, cleanupd_init_timer, req);
> > +	if (tevent_req_nomem(state->te, req)) {
> > +		tevent_req_post(req, ev);
> 
> Is this tevent_req_post really correct? We're in a callback and went
> through the event loop already.

good catch, thanks! That part was copy&pasted from the send function.

As discussed off-list, this will also need some sort of protection
against cleanup messages lost in-flight, like an ACK message or
possibly use a local tbd for posting cleanup events. Looking into
it...

Cheerio!
-slow



More information about the samba-technical mailing list