[clug] nfsroot reboot issue

Wed Oct 31 05:02:09 UTC 2018

On Wed, 31 Oct 2018, at 09:40, Bob Edwards via linux wrote:
> 
> Anyone got any good tips/pointers on what I might be doing wrong?
> 

Might be something here:
https://github.com/systemd/systemd/issues/6115#issuecomment-307993369

Quoting Neil:

"That patch is only one part of fixing this problem.
You also need util-linux newer than v2.30 (which is the latest release...) particularly commit ce6e269447ce86b0c303ec2febbf4264d318c713
You also need a small change to nfs-utils which hasn't landed upsrteam yet ... I should chase that."

and here:
https://github.com/systemd/systemd/issues/6115#issuecomment-327728281

Quoting Lennart:

"the deps are all in place already, all services and scopes (including user sessions) should all be shut down properly by the time we umount NFS shares, and that's done before we shut down the network. However, this fails to work properly if:

1. There are services which explicitly exclude themselves from killing, for example via KillMode=none or suchlike. If you have some of those, then it's really their fault, there's little we can do, please file a bug against these services asking them to not do this.

 2. If processes already hang on NFS in a non-interruptible sleep, then systemd can't kill them either. This is a limitation of the Linux kernel, and there's nothing systemd can do about them.

3. Some distros don't get the deps on their networking stacks right, i.e. miss Before=network.target in their networking service, so that the networking stack is shut down after network.target goes away, and not before.

4. If people split out /var or /var/log onto NFS they are in trouble, as journald will keep /var/log/journal busy until the very end, and will thus keep these mounts busy for good. This is a limitation of the journal, we should fix eventually (and we would have fixed this already a long time ago if we had useful IPC for the journal, but we don't, as dbus-daemon is a client of the journal, and hence the journal can't use D-Bus IPC since we'd otherwise have a cyclic dep, and deadlocks)

Note that systemd applies a time-out to service stopping, hence an unkillable process due to NFS is actually not a major problem beyond causing a delay at shutdown. Moreover, the umount commands invoked by systemd during the regular shutdown phase also have a time-out applied as well, and if they don't complete within 90s systemd won't wait for them, and continue with the shutdown. However, in the second shutdown phase (i.e. where all units are already stopped, and we transitioned into the systemd-shutdown destruction loop) we will try to umount everything left-over again, and these umount() syscalls do not have any userspace time-out applied curently, but this is being worked on [now merged] in #6598[1]. As soon as we have that we should be reasonably safe regarding hanging NFS (modulo some bugs). That is of course unless PID 1 itself hangs on NFS for some reason..."

[1] https://github.com/systemd/systemd/pull/6598

-c