[clug] Why virtual x86 machines?

steve jenkin sjenkin at canb.auug.org.au
Tue Aug 25 06:45:50 UTC 2020



> On 20 Aug 2020, at 21:09, Hugh Fisher via linux <linux at lists.samba.org> wrote:
> 
> 
> Q.1
> (Raspberry Pi vs Xeon) Looking at these prices I understand why Intel want us to virtualise x86 CPUs and run multiple guest operating systems.
> I don't see why anyone else would want to.
> 
> But since datacentres and cloud systems do use hypervisors I must be missing something.
> Anyone want to explain?
> 
> Q.2
> Second question,
> are there custom Linux kernels designed to run on hypervisors?
> Not a Container OS, which I think is something else, but like CMS designed to be single user or otherwise not duplicate what the hypervisor is already doing?
> 
> Q. 3
> And lastly I'm assuming that there's nothing in virtual x86 design and implementation that VM/370 didn't already do.
> Am I wrong?
> What new and interesting uses for hypervisors have been thought of?
> 
> -- 
> 
>        cheers,
>        Hugh Fisher

Hugh,

On Q3:

You’re right about VM/370 from the 1970’s implementing most of the current functionality.
Large-scale networking across multiple datacenters (eg. AWS, Azure, Google) is only 21st Century.

IBM ‘mainframes’ focus on a different market to x86 servers, large single compute facilities, including storage & DB's - IBM have always emphasised “RAS” - Reliability Availability Serviceability - and mostly provided it via hardware. Systems don’t need to be taken off-line very often, even for most hardware maintenance - one reason they’re popular for 24/7 systems (Police, Airline bookings, Credit Cards, Banking, …).

Google were the first Internet scale operation to address “RAS” and 100% notional uptime using cheap, imperfect hardware, with software + network providing High Availability & “RAS” functionality.

In 1990, IBM did clusters with NUMA (Non Uniform Memory Architecture) and called them a “Sysplex”
<https://en.wikipedia.org/wiki/IBM_Parallel_Sysplex>

I don’t follow IBM & z-Series, but know its possible to network z-Series & connect an x86 cabinet (for Java JVM’s) to a Sysplex. The version I saw had to be managed by the z-Series.

VM/370 provided  Virtual Machines via a single machine, later single ’sysplex', later extended with remote facilities (share a workload with another datacentre within ‘Metro’ area (200km) - targeted at banks & financial institutions).

VMware & friends commercially, and AWS, Azure & Google with own proprietarily services, allow large fleets of x86 servers, multiple 10-50 Mwatt Datacentres and Tbps interconnects.
They have a very different set of Operations, Administration and Management issues to address than z-Series Sysplex. When you’ve got 100,000 servers, everything has to be automated, nothing can be manual or “Joe (alone) knows how to do that”.

VMware & friends can take snapshots of running instances, live migrate them to another physical host and much more.
AWS etc can manage running instances transparently for clients - physical hosts fail or overload and customer workloads will get moved all the time.
I’ve no experience of the FOSS VM cluster management software.

None of that functionality was needed by VM/370, even with Sysplexes.
The “Hub & Spoke” compute model didn’t embrace even 10k servers.

=========

I wasn’t aware that IBM’s CMS was either “single user” or that its dependant on the hypervisor, wasn’t a ‘full’ O/S itself.
	<https://en.wikipedia.org/wiki/VM_(operating_system)>

In the 1970’s, I used VM/CMS for work. We only had 10’s of users, not the 1,000’s others used.
But we were able to share files - it was our source code editing system & I think build, vaguely remember being able to submit batch jobs to the Target system, DOS/VS.
I know I could edit / browse and print files across the whole system, but can’t remember the Access Control scheme.

The security boundary of “single user” in CMS is almost unique:
	"Plan 9” circa 1990 created a “local universe” per user, per login. It’s the closest I can think of.
Because of shared filesystems in CMS, a rogue CMS process or malware inside a full O/S image executing alongside might not be able to access the process / memory space, but there’ll be a vector for viruses to move around, via the filesystem.

However…
After the x86 “Spectre & Meltdown” bugs - which only leaked information AFAIK - just how ’secure’ is that security barrier?
	<https://spectrum.ieee.org/computing/hardware/how-the-spectre-and-meltdown-hacks-really-worked>

Joanna Rutkowska’s work & “Qubes” show that making VM’s secure (preventing both escaping their enclosures and leaking information) is very subtle, very hard. Qubes is based on Linux on a Xen hypervisor.

	<https://en.wikipedia.org/wiki/Joanna_Rutkowska>
	<https://en.wikipedia.org/wiki/Qubes_OS>

The notion of a standalone hypervisor is included in Wikipedia’s Comparison page as "No host OS”.
[not full list]
	<https://en.wikipedia.org/wiki/Comparison_of_platform_virtualization_software>
	- Xen (runs in ‘dom0’)
	- SUN/ Oracle ‘VM server’ (not virtual box)

VMware ESX is tagged as ’No host OS’ - though I always thought it was a heavily modified early Linux kernel, with a vestigial set of commands. I know the one time I ran it, I could SSH into the hypervisor environment.

Xen and “dom0” is different again - the hypervisor is incorporated into a linux kernel, but is really separate.

KVM is a little different again - it’s a set of kernel modules that provide the capability to run ‘user mode hosts’, but need a user-mode program to run guest O/S’s. 
QEMU/ KVM is a more accurate description.
	<https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine>

	KVM provides device abstraction but no processor emulation.
	QEMU versions 0.10.1 and later is one such userspace host.

Amazon AWS has relied on the security barriers of hypervisors to keep customer instances isolated.
In the last two years, AWS added their own hardware, “Nitro”, to manage servers and allow them to sell ‘bare metal’, not just O/S instances.
	<http://www.brendangregg.com/blog/2017-11-29/aws-ec2-virtualization-2017.html>

Presumably “Nitro” shares design elements with “Trusted Execution Environments.
	<https://en.wikipedia.org/wiki/Trusted_execution_environment>

=========

Answer to Q2:

	I was unable to find another execution environment (an O/S equiv) that relied on services from a hypervisor.
	The L4/ OKL4/ SEL4 people might disagree, 
		but then what about Rusty's ‘lguest’? The Guest O/S uses the services of the Host.

=========

Wiki comparison page mentions OKL4, (Open Kernal Labs, L4), but not Rusty Russel’s ‘lguest’ which was dropped from the kernel after v 4.14. If you’ve never seen it, it was very elegant & small - not general purpose, only creating VM’s of the current linux kernel. Ideal for kernel developers :)

	<http://lguest.ozlabs.org>
	<https://en.wikipedia.org/wiki/Lguest>

L4Linx - a kernel that can run virtualised on top of the L4 microkernel.
Does that make L4 a hypervisor?
[The UNSW L4 people strongly hold that view.]
	<https://en.wikipedia.org/wiki/L4Linux>

=========

‘ibvirt’ was mentioned in the thread as well.
This was a major addition to the Linux kernel - a single interface for virtualisation functions and management tools.

	<https://libvirt.org>
	<https://en.wikipedia.org/wiki/Libvirt>
	libvirt is an open-source API, daemon and management tool for managing platform virtualization.
	It can be used to manage KVM, Xen, VMware ESXi, QEMU and other virtualization technologies. 

=========

On Q1:

Cheap CPU cycles (R-Pi) vs “Big Iron” (now Xeon or z-Series for some people )

	 - this question goes back to 1960’s, DEC et al and “mini-computers” vs “mainframes”.

In the 1950’s & 60’s, there were only “computers”.

With the 360-series, IBM, by sales volume, grew to be larger than all the other computer vendors combined.
In the 1960’s, it was IBM and ’the seven dwarfs’, morphing in the 1970’s to IBM and “the BUNCH” - Burroughs, Univac, NCR, CDC, Honeywell
	<https://en.wikipedia.org/wiki/BUNCH>

DEC with the PDP-8 in the 1960’s showed there was a market for cheaper, smaller computers, especially for embedded systems and control tasks. Got named ‘mini-computers’.
	<https://en.wikipedia.org/wiki/Minicomputer>

Intel pioneered the microprocessor which led to ‘commercial grade’ Personal Computers (PC) within a decade.
[IBM PC in 1981. Not the first micro, but with IBM’s imprimatur, PC’s became ‘proper' computers for business, not the preserve of hobbyists.]

The background driver  is Bell’s Law of Computing Classes
	- the application of Moore’s Law to whole systems.
	- every decade-ish, the smallest viable compute platform reduces in cost by a factor of 10-1000.

As parts prices goes down and performance increases, manufacturers need to choose to build cheaper machines or faster machines - they are forced to ‘go high’ or ‘go low’. [Full paper + extracts at end]

	<https://en.wikipedia.org/wiki/Bell%27s_law_of_computer_classes>

Image from 2007 paper
	<http://2.bp.blogspot.com/_O-yBdm0tpwU/S_WWYHs-TeI/AAAAAAAAAUM/uMxmAKRf8nE/s1600/Bell.JPG>

The Raspberry Pi - an ARM processor almost in a SoC (System on a Chip), is a Single Board Computer (SBC).
It compares directly to 2005 x86 PC’s, but uses 1W - 5W.

Is it A Good Idea to lash a few thousand ARM processors together and run a datacentre on them?
Maybe.

ARM processors aren’t “super computers” though they achieve higher MIPS / Watt than x86, especially high-end Xeon.

Over the last decade, there’s been quite a few start-ups building exactly that architecture.
The R-Pi was the first ARM computer to sell “at scale”
	 - it was far from the first SBC or microchip embedded system and not the, largest, fastest or cheapest.

It’s ironic that the R-Pi comes from the UK, entirely designed, manufactured and supported there.
The ARM is a licensed CPU design, sold to chip designers for inclusion in their silicon.
Hard Disks often include an ARM processor in their silicon.
If you want the definitive “over-spec’d” CPU (99% idle), its these.

ARM stands for “Acorn RISC Machine” - as in the BBC Acorn - a UK designed CPU and microcomputer for home and educational use.
This is the irony of the R-Pi: they’ve reinvented the BBC micro at a cheaper price point & it went global.
	<https://en.wikipedia.org/wiki/Acorn_Computers#BBC_Micro_and_the_Electron>

ARM SBC’s, exemplified by the R-Pi, abound, but many other CPU types can be found. [MIPS, ATmega, …]

Embedded Linux Distributions
	<https://lwn.net/Distributions/#embed>
	<https://elinux.org/Embedded_Linux_Distributions>

Hackerboards - no idea if site commercial or not
	<https://hackerboards.com/index/>

I’d rephrase your Q1 to:

	- Where do Raspberry Pi’s excel?
	- Where do large x86 systems running VM’s excel?
	- In the overlap region, how to decide which to use?

Engineering Questions start with:

	- how much money do you have to spend?
	- what’s going to make you happy? or, What are you trying to achieve or optimise?
	- how much time & money do you have to keep this running, once built?

For a hobbyist around their own home, DIY embedded ‘appliances' using a favourite SBC + Distro is a great way to ’solve a problem’, including learning new technologies. 

If this is for a work environment where a lot of people will have to depend, for years, on any hardware / software selection, external factors such as maintenance and paid support will likely dominate due to wages cost of unplanned outages.

To scale up, the units have to be standardised, so they are replaceable by identical units.

If you end up with more than a few SBC’s, they’ll need to be networked to be managed and very quickly they’ll need a single monitoring & management console - which gets complex and tricky.

For serious general purpose compute power, x86 CPU’s are still the benchmark, though apparently AMD is overtaking Intel in bang-for-buck at the moment.

If you’re a large enterprise, running a large fleet of x86 physical hosts supporting a plethora of platforms, DB’s and licensed software, x86 will work best using VM’s and commercial management solutions. 
Upgrades, extensions and maintenance can be “safely" performed live during the day.

Using more advanced management tools, multi-site operations are possible, given a sufficiently capable network.

As Brenton noted, he’s got a beefy laptop that he uses to develop across multiple environments.
It’s cheaper, easier and more reliable for him to have a single ‘commodity’ laptop to do that, rather than lashing together a series of SBC’s in an ad-hoc fashion.
His interest is writing & testing software, but lashing together hardware in new ways.

The SAMBA developers used standard VM images to run their testing - on modest hardware it was possible to store, not run, a standard image of every version of Microsoft SMB-supporting products. That’s not possible with a fleet of SBC’s.
[I presume they had a licensing deal, possibly a simple DevNet. subscription]
	<https://en.wikipedia.org/wiki/Microsoft_Developer_Network#Software_subscriptions>

Answering Q1.

	What to buy, R-Pi or x86 + VMs, depends on what tasks you need performed,
	what constraints they’ll be under (power, heat, load, …)
	and the Quality of Service to be delivered.

	Notably, “the best” solution now will probably not be “the best” in 3-5 years when the system needs “refreshing”.
	Technology has changed substantially every decade until now and, while the rate is slower, is still changing.

Which isn’t a definitive answer, but I hope provides a framework for decision making.

all  my best
steve jenkin

====================

There are people lashing together hundreds or thousands of ARM processors to create “high performance, low power” systems. I’m sure Google, AWS and Facebook have all taken a look at this.

Parallella & their “Epiphany IV”
	2016
<https://www.parallella.org/2016/10/05/epiphany-v-a-1024-core-64-bit-risc-processor/>

Adapteva
<https://en.wikipedia.org/wiki/Adapteva>

====================

2007 - Gordon Bell [PDF]
<https://gordonbell.azurewebsites.net/Bell%27s_Law_MSR-2007-TR-146a.pdf>

	Bell’s Law accounts for the formation, evolution, and death of computer classes.
	for classes to form and evolve, all technologies need to evolve in scale, size, and performance,
		 though at comparable, but their own rates!

	The universal nature of stored program computers is such that
	a computer may be programmed to replicate function from another class. 
	Hence, over time, one class may subsume or kill off another class.

	Market demand for a class and among all classes is fairly elastic.
	In 2010, the number of units sold in a class varies from 10s, for computers costing around $100 million
	 to billions for small form factor devices
	 e.g. cell phones selling for o($100). 
	Costs decline by increasing volume through manufacturing learning curves

	Finally, computing resources including processing, memory, and network are fungible 
	and can be traded off at various levels of a computing hierarchy
	 e.g. data can be held personally or provided globally and held on the web.

	Chart with 4 trajectories:
		a.Supercomputer:
			 “the largest computers of the day”
		b. Constant price, increasing performance.
		c. Sub-class formation (cheaper, constant performance)
		d. New, “minimal priced” computers: 
			smallest, useful computer, new apps,

	1. Computers are born i.e. classes come into existence through intense, competitive, 
		entrepreneurial action over a period of 2-3 years to occupy a price range,

	2. A computer class, determined by a unique price range evolves in functionality 
		and gradually expanding price range of 10 maintains a stable market. 
		This is followed by a similar lower priced sub-class that expands the range another factor of 5 to 10.

	3. Semiconductor density and packaging inherently enable performance increase
		 to support a trajectory of increasing price and function

	4. Approximately every decade a new computer class forms as a new “minimal” computer
		 either through using fewer components or use of a small fractional part of the state-of- the-art chips.

	5. Computer classes die or are overtaken by lower priced, more rapidly evolving general purpose computers
		 as the less expensive alternatives operating alone, 
		combined into multiple shared memory micro-processors, and multiple computer clusters.

====================



--
Steve Jenkin, IT Systems and Design 
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin




More information about the linux mailing list