ctdb intermittent startup failure "Timed out waiting for lttng-sessiond"

Martin Schwenke martin at meltin.net
Tue Aug 2 20:42:31 UTC 2016


On Tue, 2 Aug 2016 10:59:38 -0500, Steve French <smfrench at gmail.com>
wrote:

> On Mon, Aug 1, 2016 at 8:10 PM, Martin Schwenke <martin at meltin.net> wrote:
> 
> > Hi Steve,
> >
> > On Mon, 1 Aug 2016 12:05:41 -0500, Steve French
> > <smfrench at gmail.com> wrote:
> >  
> > > I noticed that ctdb (Samba 4.4-stable) frequently fails to start
> > > automatically on some of our systems.    RHEL 7.2 lttng problem?  "Timed
> > > out waiting for lttng-sessiond"   Any ideas what this means or how to
> > > workaround it? Anyone recognize this?  I see similar complaints from  
> > others  
> > > (https://lists.lttng.org/pipermail/lttng-dev/2014-April/022763.html)
> > >
> > > 2016/08/01 09:55:35.916683 [15896]: CTDB starting on node
> > >
> > > 2016/08/01 09:55:35.916730 [15896]: Recovery lock file set to "".  
> > Disabling  
> > > recovery lock checking
> > >
> > > 2016/08/01 09:55:35.921198 [16507]: Starting CTDBD (Version  
> > 4.4.5-UNKNOWN)  
> > > as PID: 16507
> > >
> > > 2016/08/01 09:55:35.921266 [16507]: Created PID file /run/ctdb/ctdbd.pid
> > >
> > > 2016/08/01 09:55:35.921284 [16507]: Set real-time scheduler priority
> > >
> > > 2016/08/01 09:55:35.921398 [16507]: Set runstate to INIT (1)
> > >
> > > 2016/08/01 09:55:35.921735 [16507]: Set event helper to
> > > "/usr/libexec/ctdb/ctdb_event_helper"
> > >
> > > 2016/08/01 09:55:38.933533 [16507]: libust[16518/16518]: Error: Timed out
> > > waiting for lttng-sessiond (in lttng_ust_init() at lttng-ust-comm.c:1532)
> > >
> > > 2016/08/01 16:55:02.103492 [16507]: No event for 25162 seconds!
> > >
> > > 2016/08/01 16:55:02.103538 [16507]: libust[16768/16768]: Error: Timed out
> > > waiting for lttng-sessiond (in lttng_ust_init() at lttng-ust-comm.c:1532)
> > >
> > > 2016/08/01 16:55:02.103586 [16507]: Event script '01.reclock init ' timed
> > > out after 25162.8s, pid: 16768
> > >
> > > 2016/08/01 16:55:02.103807 [16507]: ../ctdb/server/eventscript.c:912
> > > eventscript for 'init' timedout. Immediately banning ourself for 300  
> > seconds  
> > >
> > > 2016/08/01 16:55:02.103826 [16507]: ctdb exiting with error: Failed to  
> > run  
> > > init event
> > >
> > > 2016/08/01 16:55:02.103834 [16507]:
> > >
> > > 2016/08/01 16:55:02.103842 [16507]: CTDB daemon shutting down
> > >
> > > 2016/08/01 16:55:02.111101 [16507]: Removed PID file /run/ctdb/ctdbd.pid  
> >
> > I'm not sure about the lttng part.  I don't think we have a way of
> > using lttng with CTDB.
> >
> > However, to me it looks like time jumped forward by a few hours and
> > CTDB reacted badly.  CTDB needs time to be stable...
> >  
> 
> These systems use ntp to sync with a stable time source, so my assumption
> is time zone is temporarily off during boot not time drift.   Note that
> 25162 seconds is almost exactly 7 hours (6.99 hours) so maybe time zone is
> not set during reboot until after some services are started (ctdb service
> is autostarted like a typical service, nothing unusual - but maybe some
> other service has to start first or set time zone event has to occur first).

I guess there might be a systemd service for NTP that ctdb.service
needs to list a dependency for (but doesn't)?  I don't have a RHEL 7
handy right now to check on...

peace & happiness,
martin



More information about the samba-technical mailing list