ctdb intermittent startup failure "Timed out waiting for lttng-sessiond"

Steve French smfrench at gmail.com
Tue Aug 2 15:59:38 UTC 2016


On Mon, Aug 1, 2016 at 8:10 PM, Martin Schwenke <martin at meltin.net> wrote:

> Hi Steve,
>
> On Mon, 1 Aug 2016 12:05:41 -0500, Steve French
> <smfrench at gmail.com> wrote:
>
> > I noticed that ctdb (Samba 4.4-stable) frequently fails to start
> > automatically on some of our systems.    RHEL 7.2 lttng problem?  "Timed
> > out waiting for lttng-sessiond"   Any ideas what this means or how to
> > workaround it? Anyone recognize this?  I see similar complaints from
> others
> > (https://lists.lttng.org/pipermail/lttng-dev/2014-April/022763.html)
> >
> > 2016/08/01 09:55:35.916683 [15896]: CTDB starting on node
> >
> > 2016/08/01 09:55:35.916730 [15896]: Recovery lock file set to "".
> Disabling
> > recovery lock checking
> >
> > 2016/08/01 09:55:35.921198 [16507]: Starting CTDBD (Version
> 4.4.5-UNKNOWN)
> > as PID: 16507
> >
> > 2016/08/01 09:55:35.921266 [16507]: Created PID file /run/ctdb/ctdbd.pid
> >
> > 2016/08/01 09:55:35.921284 [16507]: Set real-time scheduler priority
> >
> > 2016/08/01 09:55:35.921398 [16507]: Set runstate to INIT (1)
> >
> > 2016/08/01 09:55:35.921735 [16507]: Set event helper to
> > "/usr/libexec/ctdb/ctdb_event_helper"
> >
> > 2016/08/01 09:55:38.933533 [16507]: libust[16518/16518]: Error: Timed out
> > waiting for lttng-sessiond (in lttng_ust_init() at lttng-ust-comm.c:1532)
> >
> > 2016/08/01 16:55:02.103492 [16507]: No event for 25162 seconds!
> >
> > 2016/08/01 16:55:02.103538 [16507]: libust[16768/16768]: Error: Timed out
> > waiting for lttng-sessiond (in lttng_ust_init() at lttng-ust-comm.c:1532)
> >
> > 2016/08/01 16:55:02.103586 [16507]: Event script '01.reclock init ' timed
> > out after 25162.8s, pid: 16768
> >
> > 2016/08/01 16:55:02.103807 [16507]: ../ctdb/server/eventscript.c:912
> > eventscript for 'init' timedout. Immediately banning ourself for 300
> seconds
> >
> > 2016/08/01 16:55:02.103826 [16507]: ctdb exiting with error: Failed to
> run
> > init event
> >
> > 2016/08/01 16:55:02.103834 [16507]:
> >
> > 2016/08/01 16:55:02.103842 [16507]: CTDB daemon shutting down
> >
> > 2016/08/01 16:55:02.111101 [16507]: Removed PID file /run/ctdb/ctdbd.pid
>
> I'm not sure about the lttng part.  I don't think we have a way of
> using lttng with CTDB.
>
> However, to me it looks like time jumped forward by a few hours and
> CTDB reacted badly.  CTDB needs time to be stable...
>

These systems use ntp to sync with a stable time source, so my assumption
is time zone is temporarily off during boot not time drift.   Note that
25162 seconds is almost exactly 7 hours (6.99 hours) so maybe time zone is
not set during reboot until after some services are started (ctdb service
is autostarted like a typical service, nothing unusual - but maybe some
other service has to start first or set time zone event has to occur first).

-- 
Thanks,

Steve


More information about the samba-technical mailing list