Hypothetic Computing

From Granizada

Jump to: navigation, search

THIS ARTICLE HAS NOT YET BEEN FULLY CONVERTED TO WIKI FORMAT (screenshots still not here.)

Original May 2005 v0.65 from the linux.conf.au 2004 winning paper

Updated Jul 2006 v0.7

Whole-system hardware virtualisation techniques arise in many fields of computer science. Such techniques are now commercially important in their own right and subject of many open source projects, but there is no branch of computing that deals with the subject. There is no agreed way of comparing or even consistently describing the many different approaches to hardware virtualisation. The complexity and significance of whole-system virtualisation justify recognition as a subject in computer science, and this paper is an exploratory step towards that end. We define the notion of "hypothetic computing" and derive a rule-of-thumb metric that summarises the capabilities of a hypothetic system for further organised study.

Contents

Introduction

Hardware virtualisation allows notional hardware to be constructed which is not present in real life, and may not be embodied in silicon anywhere. But a virtualisation system has limits to how much non-real (or emulated, or hypothetical) hardware can be created. It would be handy to have a quick way of seeing roughly how hypothetic different systems are since so many systems are a mixture of real and emulated components.

Hypothetic Computing lumps together technologies which are currently buried in many different budgetry and technical classifications. The techniques involved vary wildly in their technical specifications, and there may be no reason to consider them related except as part of a spectrum of virtualisation.

One interesting result is that products and techniques that have never been thought of as equivalent may in fact offer similar possibilities to users. Another use is in evaluating virtualisation products, where in some cases claimed advantages only exist because the wrong things are being compared.

Assumptions

  • we only deal with complete hardware implementations capable of booting an OS. There are many things with the name virtual machine which do not meet this criterion.
  • Virtual Machines are not necessarily complete virtual hardware. The Parrot Interpreter or the Microsoft Virtual Execution System are virtual machines which cannot be expected to boot an operating system. Even the Java Virtual Machine (a relatively complete example) would require ugly Linux kernel hacking to boot Linux (for a start, the JVM lacks unsigned loads and stores.) Even PicoJava, the Java derivative intended for instantiation in hardware, would be a difficult Linux porting target. This discussion often runs into issues of hardware support for object oriented languages for which many virtual machines are being built — if this were to happen then many of these problems become less serious. But for now 'ability to boot an OS' means 'is a target for low-level stack-oriented languages, especially C'.
  • we are only dealing with virtualisation technology whose maximum speed is at most two orders of magnitude slower than any corresponding real hardware (ie at most 99x slower than real life.) Negative orders of magnitude slower already exist and will become more common over time.

Nomenclature Confusion

The term hypothetic is used because the proposed field of study is new and there are no existing words that can be used without confusion.

The words 'virtualisation', 'simulation', 'imaginary', 'real' and a host of others are already widely used in computer science and in the marketplace, often in multiple conflicting ways. A 'Real Computer' is a type of postulated analogue computer that probably will never exist (see Real computation and weep for the art of plain communication!)

Commercial and Open Source Worlds

Commercially, phrases have been constructed such as "full system simulation" and "supersystems virtualisation", but these are clumsy and have to be explained from scratch every time. In an industry where all players makes up their own terms everyone can claim to be the leader, but this approach won't work as the market matures.

In open source, the many different virtualisation projects have largely ignored the fact that they have some conceptual and often functional commonalities. When this is addressed we may see the traditional cross-pollination effects kick in with common methods for communicating state between different kinds of virtualisation tools.

Hypothetic Properties

A virtualisation framework with "hypothetic" properties is one which has many possible ways to create the virtualised hardware. The more hypothetical properties a system has, the more kinds of different hardware it can support.

Examples:

  • a PC bios that can switch multiple CPUs on and off has a minor hypothetic property.
  • the IBM VM operating system has many more hypothetic properties, but is still limited because all target systems are variations on the zSeries architecture and VM is not portable across other hardware
  • the QEMU instruction set simulator-based virtulisation platform has even more hypothetic properties because it can simulate multiple target architectures and is itself portable

Basic Definitions and Non-Definitions

V12N
The letter V followed by 12 letters and then an N, or virtualis[z]ation.
Hypothetic system
A system which is capable of creating virtualised hardware. The hypothetic system can be hardware, firmware or software. It is the means by which hardware is instantiated for an OS to recognise.
Host
This is a non-definition. Host should never be used in discussion of hypothetic systems. There are several well-understood meanings for host and using it in the context of V12N often leads to confusion.
Mothership
The real-world hardware on which the hypothetic system runs. Use this intead of host! Occasionally the mothership may have itself been created by a hyothetic system, something which is likely to become more common in years to come.
Target
A virtual machine created by a hypothetic system.


Hypothetic Index

The hypothetic index is a rough rule of thumb that indicates how many V12N degrees of freedom there are in a hypothetic system. It is not a measure of functionality or capability (beyond the two tests all hypothetic systems are assumed to meet) and it does not indicate relative quality.

The hypothetic index is useful for:

  • Classifying V12N systems to see which can be reasonably compared. Useless comparisons are very easy to produce in V12N because often two systems which appear similar have such different design goals that neither will ever be as good as the other for any particular task.
  • Matching tasks to the range of virtual systems suitable for that task. Applying a virtual system with the wrong properties is common and usually gives poor results.

Table: Hypothetic Index

Property Weighting
Portable across mothership hardware architectures 1
Multiple target architectures same mothership 1
Cycle accurate V12N 1
Fast - same order of mag. as real hardware 1
Multiple virtualised components in targets 1
Checkpoint/restore across supported mship arches 1
Distribute system across multiple motherships 1
Do not require OS to be modified to run on target 1

Detailed Explanation of Index Factors

Portable across mothership hardware architectures
Does the hypothetic system run on more than one distinctly different hardware architectures?

It is often assumed that such portability implies a strictly user-space solution, however there are some systems where this is not the case.

Multiple target architectures same mothership
Does the same V12N framework allow multiple simultaneous targets to be running which are different hardware architectures, eg IA-64 and MIPS?

Some systems trade this feature for performance and size, and only allow targets with an architecture related to the mothership architecture, even though they do support multiple mothership architectures. Others lack this feature for the less interesting reason that they are tied to one mothership architecture.

Cycle accurate V12N
Allows the user to run software which interacts intimately with the hardware and enables monitoring of low-level items such as processor cache hits and misses.

This is often mutually exclusive with high execution speed because of the computational overhead involved in keeping all clocked components in synch with their particular clock. But this need not be the case target architectures are inherently slow, or very efficient new software techniques are used, or there is firmware assistance for the task.

Fast - same order of mag. as real hardware
In the context of the assumption that all hypothetic systems are 'usefully fast', ie not more than two orders of magnitude slower than corresponding real hardware.

Some of the fastest V12N systems get within a few percentage points of the native speed of the real hardware the target represents. Others are 60-80 times slower, while still others are many times faster than the real hardware ever will be because it is no longer produced.

This especially impacts hypothetic systems as used with legacy software. The lifespan of software systems is increasing faster than the hardware they run on (which is staying relatively constant at around three years, often for non-technical reasons.) This means that many more hypothetic systems are likely to be used over time to keep legacy systems running in what they think is the hardware environment they were designed for. For this application of hypothetic systems fast will become increasingly true. It is already true for targets implementing low-power embedded CPUs.

Multiple virtualised components in targets
Ability to select from a wide range of devices, memory types and capacity, busses perhaps, CPU families etc. The purpose of this item is to highlight those systems that only offer minimal customised target environments by for example inheriting most of the hardware characteristics from the mothership.
Checkpoint/restore across supported mship arches
This covers two related factors in one.
  1. the ability to snapshot a running system and then restore it with all state intact
  2. ability to do this across supported mothership architectures.

For example VMware only supports a single hardware architecture but it does have full checkpointing, so it scores 1. However Simics supports many mothership architectures, so in order to get 1 Simics must meet the more demanding task of taking a checkpoint image from a mothership running on one architecture (such as Sparc/Solaris) and restoring it on another (perhaps Linux on i386.)

Distribute system across multiple motherships
A few hypothetic systems have the ability to aggregate more than one physical mothership when instantiating a target.

Currently the most advanced example is probably the Charon range of products, unless the assorted distributed single-system image Linux clusters can be classified as hypothetic (this is still a matter of debate!)

Do not require OS to be modified to run on target
There are some significant speed and control advantages if a target OS can be slightly tuned for a particular hypothetic system, usually by running special device drivers.

The best-known commercial example is VMware, but MacOnLinux and Xen also take this approach. The corner case is often an operating system installer, and if a target system can successfully run an installer the hypothetic system has done a very good job of earning its 1 for this attribute.

Comments on the Index

Equal weighting for these diverse factors seems to yield good results. This could be changed, but the danger is that a subjective bias might be introduced — putting a premium on how hard factor X is to implement, or its value to a certain group of users, or how rare it currently is in hypothetic systems.

Each of the 8 properties given is a major barrier in implementing a hypothetic system. Some pairs of these properties are regarded as mutually exclusive, however the art of V12N is advancing very rapidly.

All hypothetic systems can implement some properties as features, but no current system can implement all properties with marginal cost. This is because of the near mutual exclusions built into the properties selected. The distribution of which properties are easy to implement varies depending on the V12N technology. The least obvious case is perhaps mobile checkpointing (checkpoint/restore across supported mship arches) which it is true can usually be tacked on to a hypothetic system as a feature. However this is a challenge for hypothetic systems implemented in hardware or firmware. For each property there is at least one class of hypothetic system for which that property is very difficult to achieve or even currently impossible.

There are some approximate conclusions (in the year 2004) that can be drawn from the hypothetic index.

  • a higher number indicates more generality and often more maturity
  • anyone claiming an index of 8 probably doesn't understand V12N
  • an index of 4 and above probably indicates a system built with general V12N in mind, as opposed to one where the V12N aspects were implemented to help achieve some specific goal
  • most systems can increase their index by at least one by implementing additional features
  • an index of 6 or 7 indicates a hypothetic system that is extremely flexible but which has a major obstacle to overcome. Overcoming this obstacle is made considerably harder by its very success in the other points. This is somewhat assymtopic — no matter which is the missing factor, it will be very difficult to implement without compromising one of the other factors.

Table: Sample Results for Hypothetic Index

Hypothetic system Index CoWare 2
Hercules 4
Virtio 4
VMware 4
Simics 6
DoubleWide 4
User Mode Linux 4
QEMU 4
Charon 6
Xen 5

(I need to go through each of the products as worked examples, justifying how I reach these conclusions. There are some grey areas and I may have made some hasty decisions…. also there are another 20 or so products to cover)

Virtualization <-- put this here for confused search engines or searchers who spell it with a "z" :-)

Creative Commons License
Creative Commons Attribution iconCreative Commons Share Alike icon
This content is licensed under the Creative Commons
Attribution ShareAlike License v. 2.5:
http://creativecommons.org/licenses/by-sa/2.5/
GNU head GFDL: Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". (shearer.org uses but does not currently recommend the GDFL and here's the explanation why. )
Personal tools
Navigation