delayed network interfaces and ctdb service startup

Steve French smfrench at gmail.com
Thu Sep 8 03:49:44 UTC 2016


I am not convinced it is network related after all, may have misread
one of the logs e.g.


2016/08/31 21:01:32.549360 [ 3623]: CTDB starting on node
2016/08/31 21:01:32.549697 [ 3623]: Recovery lock file set to "".
Disabling recovery lock checking
2016/08/31 21:01:32.556948 [ 3666]: Starting CTDBD (Version
4.4.6-UNKNOWN) as PID: 3666
2016/08/31 21:01:32.557023 [ 3666]: Created PID file /run/ctdb/ctdbd.pid
2016/08/31 21:01:32.557040 [ 3666]: Set real-time scheduler priority
2016/08/31 21:01:32.557138 [ 3666]: Set runstate to INIT (1)
2016/08/31 21:01:32.557605 [ 3666]: Set event helper to
"/usr/libexec/ctdb/ctdb_event_helper"
2016/08/31 21:01:35.951911 [ 3666]: libust[4475/4475]: Error: Timed
out waiting for lttng-sessiond (in lttng_ust_init() at
lttng-ust-comm.c:1532)
2016/09/01 02:09:10.002215 [ 3666]: No event for 18453 seconds!
2016/09/01 02:09:10.002240 [ 3666]: libust[4698/4698]: Error: Timed
out waiting for lttng-sessiond (in lttng_ust_init() at
lttng-ust-comm.c:1532)
2016/09/01 02:09:10.002277 [ 3666]: Event script '05.system init '
timed out after 18454.0s, pid: 4698
2016/09/01 02:09:10.002463 [ 3666]: ../ctdb/server/eventscript.c:912
eventscript for 'init' timedout. Immediately banning ourself for 300
seconds
2016/09/01 02:09:10.002478 [ 3666]: ctdb exiting with error: Failed to
run init event

On Wed, Sep 7, 2016 at 10:45 PM, Steve French <smfrench at gmail.com> wrote:
> I can try your attached patch (seems simpler, and better) - but first
> I want to see what happens tomorrow when they run the update with the
> restart=on-failure
>
> On Wed, Sep 7, 2016 at 9:34 PM, Martin Schwenke <martin at meltin.net> wrote:
>> Hi Steve,
>>
>> On Wed, 7 Sep 2016 20:32:37 -0500, Steve French <smfrench at gmail.com>
>> wrote:
>>
>>> Ran into an issue with ctdb service startup failing (on RHEL) when
>>> rebooting and there was a software update on the reboot (normally
>>> works fine on reboot on next boot after software update (or before
>>> software update etc.) but presumably network comes up late when
>>> software update in progress).
>>
>> It would be interesting to know what the actual failure was.  ;-)
>>
>>> Should ctdb service file be configured with something like
>>>
>>> Restart=on-failure
>>> RestartSec=60s
>>>
>>> or Restart=on-abnormal ....
>>>
>>> I noticed that ctdb was being started after networking, but presumably
>>> that is not good enough to guarantee that socket is usable
>>>
>>> [Unit]
>>> Description=CTDB
>>> After=network.target
>>
>> https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
>> suggests that we need After=network-online.target instead of just
>> network.target.
>>
>> The above systemd documentation says:
>>
>>   network.target has very little meaning during start-up. It only
>>   indicates that the network management stack is up after it has been
>>   reached. Whether any network interfaces are already configured when
>>   it is reached is undefined. It's primary purpose is for ordering
>>   things properly at shutdown: [...]
>>
>> and:
>>
>>   network-online.target is a target that actively waits until the
>>   nework is "up" [...]
>>   It is strongly recommended not to pull in this target too liberally:
>>   for example network server software should generally not pull this in
>>   (since server software generally is happy to accept local connections
>>   even before any routable network interface is up), it's primary
>>   purpose is network client software that cannot operate without
>>   network.
>>
>> to which I quite naturally say: Yay!  Systemd!  That wasn't obvious...  :-(
>>
>> So, I can guess 2 reasons for CTDB not starting:
>>
>> * CTDB insists on binding a known node address to its TCP socket.  If
>>   the address isn't available/local then the bind will fail.
>>
>> * The public address setup is also unforgiving.  If a specified
>>   interface is not available then CTDB will abort early.
>>
>> Can you please test if the attached patch improves things and, if so,
>> give it a Reviewed-by: ?   :-)
>>
>> Thanks...
>>
>> peace & happiness,
>> martin
>
>
>
> --
> Thanks,
>
> Steve



-- 
Thanks,

Steve



More information about the samba-technical mailing list