[clug] f19 unusual behaviour

Eyal Lebedinsky eyal at eyal.emu.id.au
Fri Dec 13 01:04:14 MST 2013


Since a short while ago I have an issue where my server gets into trouble dealing with local service
when another machine in the house hibernates. I never had this problem before 6/Dec, when it started.
Running up-to-date f19.

I have a cron task that checks for lost services, it does something very simple for each service:
	/bin/systemctl -q is-active "$service.service" && return 0
	echod "check failed at '$date', starting '$service'"
	/bin/systemctl start "$service.service"

The moment the client machine turns off I see this:

Client machine (e4) shuts down:
======
Dec 13 13:03:20 e4 kernel: [105165.005551] PM: Syncing filesystems ... done.

Server:
=======
The next cron runs at 13:05 (scheduled for '*/5'):

Failed to get D-Bus connection: Failed to authenticate in time.
2013-12-13 13:06:31 check-services: check failed at '2013-12-13 13:05:01', starting 'network'
Failed to get D-Bus connection: Failed to authenticate in time.
Failed to get D-Bus connection: Failed to authenticate in time.
2013-12-13 13:09:32 check-services: check failed at '2013-12-13 13:08:02', starting 'named'
Failed to get D-Bus connection: Failed to authenticate in time.
Failed to get D-Bus connection: Failed to authenticate in time.
2013-12-13 13:12:32 check-services: check failed at '2013-12-13 13:11:02', starting 'httpd'
Failed to get D-Bus connection: Failed to authenticate in time.
Failed to get D-Bus connection: Failed to authenticate in time.
2013-12-13 13:15:32 check-services: check failed at '2013-12-13 13:14:02', starting 'dhcpd'
2013-12-13 18:34:14 check-services: done

Importantly, note how the final line is five hours later, when the client machine is restarted.

So two problems:
1) the service check fails (rather than say it is active, as it is). Seems to take 1m30s for
    a D-Bus failure.
2) the service start can hang (in this case for 'dhcpd') forever...

Does anyone have any idea where I should look? How is the D-Bus affected by another machine going
down when I do not have any reference to it AFAIK. This is highly reproducible.

TIA

--
Eyal Lebedinsky	(eyal at eyal.emu.id.au)


More information about the linux mailing list