[clug] Random Thought: support for Hot-code swap in the kernel

Tue Jun 30 01:08:49 GMT 2009

jm <jeffm at ghostgun.com> writes:

> What would it take to do hot-code swapping at the OS level in Linux?

For the kernel?  Installing the ksplice patches, probably, although you can
hand-write the patch code or potentially find other options.

> More fully, I was wondering what it would take to be able to have one
> version of an application start up, inherit its state from another version
> that is already running, have the old version shutdown and the new version
> continue running without a noticable break in service.

Write it in a real, sane language.

> There are a number of languages which can do this. However, it's a language
> specific feature and it relies on an image or virtual machine. Update the vm
> and you loose the continuity.

Not if the VM supports that.  IIRC, an Erlang application written
appropriately can have multiple instances on a single machine[1], and you can
shut down and restart the individual components.

> Ideally, you'd also be able to migrate processes between machines so that
> you could stop one machine to upgrade the OS. The Xen hypervisor, for
> example, supports the migration of host OSes between machines when the host
> OS's filesystem is mounted via nfs.

OpenVZ also supports "migration" as well as checkpoint/restart of
applications; discussion was still ongoing, last I saw, about how to integrate
that into the upstream kernel.

> Ignoring such migration for the moment, ie limiting the application to one
> host OS instance. What is currently available to make the transfer of OS
> process state possible and what is missing?

You are asking the wrong question: nothing is missing.

> If it wasn't for the process state in kernel space it would just be a matter
> of having a set of functions in the application which would detect the
> presents of the new app, serialise the state, send the state to the new app,
> etc. It's things like sockets, file descriptors, etc which screw this idea
> up.

All of those are inheritable over exec, with appropriate care.  Heck, you
don't even need to ask the kernel for permission: just bind a trivial core and
an ELF dynamic linker in, then call down to that when you want to restart.

You can serialize in memory, unmap all the other code, map in new segments and
dynamically link appropriately, then return to the "application" rather than
the "restart" portion without having to do anything to the kernel process
context.

Then, of course, you need to deal with any application level data structure
changes while you resume. :)

Anyway, nothing you are asking for is even particularly difficult.  The reason
that it is more common in VM hosted environments is that they almost always
inherently build in the dynamic linker, and probably even the compiler, not
because of any inherent difference in their abilities in this area.

Also notably, environments like Common Lisp which do show many of the same
properties are frequently possible to patch entirely at runtime without having
to restart, because the entire language is self-hosting.

Regards,
        Daniel

Footnotes: 
[1]  In fact, it probably wants them, one per core, because Erlang doesn't do
     native threads internally, as far as I know.