[clug] Why virtual x86 machines?

Hugh Fisher hugo.fisher at gmail.com
Thu Aug 20 11:09:11 UTC 2020


Inspired by the questions about KVM, I've been doing some reading on
virtual machines and containers and some of the other new abstraction
& protection mechanisms being used today. I like to write things down
to clarify my thinking, and am posting this to the list in the hope
that people with more knowledge will correct me if I'm wrong. And I do
have questions, at the end.

First up I'm not including the Java Virtual Machine, or the similar
bytecode like systems used in .NET, Python, etc. Those are designed
for user level programs, not OS kernels. And I'm not including
emulation/simulation where machine instructions are interpreted by
another program, because then it's turtles all the way down. A 6502
Apple II running ProDOS can be emulated by a program on a M68030
Macintosh running System 7 which itself is being emulated by a program
running on a PowerPC Macintosh running MacOS X ...

So, a virtual machine, usually associated with a hypervisor and guest
operating system kernels, executes as many as possible machine
instructions on the actual CPU hardware. (Using the old definition
that you can kick hardware, but only swear at software. And just skip
over microcode.)

>From my old Andy Tanenbaum textbook the first virtual machine in
widespread use was VM/370 for IBM mainframes, around 1970. I think the
history is important because of a question I'll bring up later.

A 370 series IBM mainframe, ancestor of the backwardly compatible zOS
mainframes still sold today, could easily cost a million dollars. A
370 mainframe would run an entire bank financial system, or an entire
airline reservation network. Which was awkward if a new release of the
operating system was due and you wanted to test that all your software
would still work. Shut down everything while you reboot into a beta
OS? Buy another million dollar mainframe just for testing?

VM/370 was what today we call a hypervisor, that could run multiple
guest operating systems side by side on a single CPU, providing each
operating system its own "virtual 370". Now the bank could run VM/370
on its single mainframe, with say 90% of machine resources allocated
to the guest production OS and the rest given to whatever the
developers wanted.

This was a major technical achievement. Then, like now, the operating
system distinguished 'user mode' from 'kernel' or 'privileged' or
'system' mode. User mode machine instructions could not modify virtual
memory page tables, issue DMA instructions to IO hardware, and so on.
Only kernel code could do that. So unlike a regular operating system
the hypervisor had to work with guest operating system kernels
executing privileged machine instructions. The guest kernels didn't
know that they were running on a virtual 370, so it was up to the
hypervisor to ensure that if, say, one guest OS disabled interrupts,
this wouldn't shut down every other guest.

Once IBM got VM/370 to work, it was a big hit. It was so popular both
inside and outside IBM that some new instructions and microcode
modifications were added to the 370 machine architecture to make IO
and memory paging within the guest operating systems more efficient.

And IBM then developed CMS, a hypervisor-aware operating system kernel
designed to run only on VM/370. A conventional OS protects multiple
users from affecting each other, whether deliberate or accidental. CMS
was a single user OS, and VM/370 gave every user their own copy on
their own virtual 370. Even if there was a kernel exploit in the CMS
operating system (not the hypervisor), the only person you could
attack would be yourself. CMS was a smaller and simpler operating
system because it didn't duplicate functions that VM/370 was already
doing.

Now fast forward to the 21st century. If you
    cat /proc/cpuinfo
on an x86 Linux system and you see 'vmx' in the output, you have the
Intel virtual machine hardware extensions. The original x86
architecture had Ring 0 for privileged machine instructions as used by
operating system kernels. The virtual hardware extensions add Ring -1
for a hypervisor such as VMWare, which can run multiple guest Linux or
MS Win kernels side by side. Each of these thinks it is running with
Ring 0 privilege and can update page tables, issue IO instructions to
PCI slots or disk controllers, and so on.

So Intel virtual x86 is just like VM/370. Except ... x86 computers
don't cost a million dollars.

So my most important question, why bother? Just buy another CPU.

I did a quick price comparison on www.mwave.com.au. The cheapest Intel
Xeon is about $4,000 and it's possible to spend $14,000 if you want
to. For those amounts of money you could buy a shoebox up to shipping
container full of Raspberry Pis, complete 64 bit Ghz systems with RAM
and ports. Or if you have to stay within the x86 family, Intel
Celerons are at least five times cheaper than Xeons. Looking instead
at power budget, the cheapest Xeon CPU consumes as many watts as five
entire Raspberry Pis.

Looking at these prices I understand why Intel want us to virtualise
x86 CPUs and run multiple guest operating systems. I don't see why
anyone else would want to.

But since datacentres and cloud systems do use hypervisors I must be
missing something. Anyone want to explain?


Second question, are there custom Linux kernels designed to run on
hypervisors? Not a Container OS, which I think is something else, but
like CMS designed to be single user or otherwise not duplicate what
the hypervisor is already doing?


And lastly I'm assuming that there's nothing in virtual x86 design and
implementation that VM/370 didn't already do. Am I wrong? What new and
interesting uses for hypervisors have been thought of?


-- 

        cheers,
        Hugh Fisher



More information about the linux mailing list