Virtualisation is a powerful but complex technology that is valuable for consolidating servers onto new hardware, supporting legacy systems, and numerous other functions. This guidance is for organisations who are considering deploying virtualisation, and is particularly relevant for public sector organisations operating at OFFICIAL within such deployments.
More specifically, this guidance helps risk owners and system managers to:
- manage the risks associated with virtualisation
- make informed decisions about the design, configuration, management and use of virtualisation
Note: This guidance concentrates on virtualisation on the x86 platform (both 32-bit and 64-bit), although it may be applicable to other architectures.
A virtualisation product is a large and complex piece of software. It has a similar level of complexity to an operating system.
Enabling every feature of a virtualisation product can leave a system open to attack. Conversely, disabling too many features will mean it can no longer be used effectively. Security measures should therefore be determined by balancing risk with business benefits
The following key principles are central to this guidance:
- virtualisation is only a security factor when it is being used as a security barrier
- virtualisation should not be used to separate security classifications
- virtualisation does not replace dedicated administration and configuration control
Virtualisation does not necessarily require extra security controls
The separation provided by virtualisation is only a security factor if it is being primarily used as a security barrier. Applying strict security measures when virtualisation is only being used for performance or ease of administration is a poor use of resources that will probably add little to security, and will likely result in the benefits of virtualisation not being achieved.
On the other hand, failing to apply security measures when virtualisation is primarily used as a security barrier can result in the barrier being easily breached. Virtualisation is not a magic bullet; like all software, it will contain security vulnerabilities.
Once this distinction is clear, many security decisions become simple to make. For example, many virtualisation products provide ways to directly transfer data between virtual machines. It is easy to waste resources in trying to decide the merits of one data transfer mechanism over another, when the question should really be ‘Should there be a connection between these virtual machines?’
If virtualisation is being used to provide a security barrier, such a connection - however it is made - will breach that barrier. However, if virtualisation is not acting as a security barrier, the data sharing should be of no more concern than, for example, file-sharing between machines.
Virtualisation should not be used to separate security classifications
You should not use virtualisation as a security barrier between different security classifications, because different classifications have different threat models. If unavoidable, seek advice from CESG.
Note: OFFICIAL and OFFICIAL SENSITIVE are not different security classifications.
Virtualisation does not replace dedicated administration and configuration control
Virtual machines should be administered as if they were real machines as far as is sensible, with the same policies for configuration, installation of security software, patching, and so on. A virtualised system is not inherently more secure than one without virtualisation, and a vulnerability in an operating system or application can be exploited as easily in a virtual machine as a physical one.
Virtualisation software can make it easier for the system administrator to employ good security practices, such as golden images and pre-configured security features. This allows virtual machines to be created with a patched and configured ‘known good’ version of the operating system. This reduces the risk of machines being misconfigured or set up in an unpatched (or insecure) state. In the spirit of ‘Secure by Default’ , it is better to provision patched, secured, and pre-configured virtual machines than expect the administrators of each virtual machine to patch and configure them.
Basing all the virtual machines off similar images can also make monitoring easier, so that it is easier to spot anomalous behaviour on the network. However, problems will arise over time if the golden image isn’t updated regularly, so it is important to update this as security patches become available.
Virtualisation can be so transparent to the user that it is easy to forget it is in use, and therefore easy to overlook when considering risk. This section considers general classes of vulnerability introduced by virtualisation.
Virtual machine breakout
A vulnerability in the virtualisation software may allow an attacker with access to a virtual machine to break out of that virtual machine and affect the host. This is usually limited to crashing the host, but a minority of vulnerabilities can lead to code execution on the host.
This allows an attacker to affect every virtual machine running on the same host. It is particularly critical if the virtualisation is being used as a security barrier, or where availability is a high priority.
Security weaknesses within a virtual machine
A vulnerability in the virtualisation software may allow an attacker to perform some sort of unauthorised activity within a virtual machine. This is usually because the virtualisation software fails to accurately interface to or emulate the underlying hardware, and can include privilege escalation or crashing the virtual machine. This is less serious than a virtual machine breakout because an attacker is forced to attack each virtual machine separately. However, because of the attention paid to breakout attacks, it is easy to overlook.
Poor configuration and administration
Since it is considerably easier to bring up a new virtual machine than to get a new PC up and running, virtual machines can seem disposable, with administration and configuration both becoming lax as a result. However, poor configuration or administration means an attacker can exploit the use of virtualisation without needing a vulnerability in the virtualisation software itself. The attacker does not care whether they are attacking a real machine or a virtual one, only whether their attack is successful.
If virtual machines are set up and administered with less care than physical machines, an attacker can make use of this oversight. Once one virtual machine has been compromised, it can be used as a bridgehead to attack other, better-configured machines.
An additional concern, particularly at higher classifications, is that virtual machines could provide weak separation between networks or other accesses that should be strongly separated. It may be fine for Alice to use a virtual machine for one purpose, and Bob to use a different virtual machine for another purpose, but combining them on the same infrastructure would inadvertently increase the risk for both by reducing previously strong separation to just that provided by the virtualisation software.
The admin console
An attacker gaining the ability to administer a virtualisation product will be able control much of the system, and may possibly have complete access.
Disk encryption bypass
If encryption is used on the virtual disks of a virtual machine, unencrypted data may end up cached on other parts of the underlying physical storage by the virtualisation software.
Security mitigations: making the attacker’s job harder
Operating system security features within a virtual machine do not necessarily block a security problem with the virtualisation, because the virtualisation software is operating at a layer below the virtual machine’s operating system. Protecting a system from these vulnerabilities therefore has to concentrate on denying an attacker the opportunity to exploit the vulnerability, combined with reducing the impact of a successful attack.
There are many stages to a successful attack, and the security of the system can be improved by deploying mitigations to make the attacker’s job harder at as many stages as practicable.
The following mitigations at each stage should not be treated as a list of defences that must be deployed, rather as a non-exhaustive list of suggestions that can be implemented depending on risk appetite. They can be viewed as ‘attacker filters’, since each stage offers the opportunity of filtering out some potential attackers, and the mitigations at each stage control how effective that filter is. Some of them are not unique to virtualisation.
Deny access to the system
An attacker can be denied the opportunity to exploit a vulnerability by denying them access to a virtual machine. By ensuring only legitimate users have access to a virtual machine (and those legitimate users have no more privileges than they need), an attacker has a smaller pool of users to attack, and will be unable to use exploits that need higher privileges without an additional privilege escalation exploit.
The residual risk is that the attacker successfully breaks in through a legitimate user, or is a legitimate user themselves.
- protect external connections
- limit which users have access to the system
- avoid having external protections that are vulnerable to the same attack as the system to be protected (for example, if a firewall and a server are virtualised together, a single vulnerability can exploit both)
- to combat theft with mobile computing, or any other situation where a device’s storage needs protecting, use disk encryption on the physical storage rather than relying on encryption of the virtual storage
Deny the opportunity to exploit the vulnerability
An attacker can be denied the opportunity to exploit a known vulnerability by removing it before it can be exploited. This can involve applying a patch against the vulnerability, or by configuring the system so that a vulnerable feature is no longer available to be exploited.
The residual risk is that the attacker will attack before the vulnerability has been removed, or that they have an exploit for a new and previously unknown vulnerability.
- virtualisation software patched; an attacker is much more likely to deploy a known exploit against an unpatched system than reveal a previously unknown exploit
- limit access to information about the system, so an attacker risks revealing themselves while deploying a non-working exploit
- detect attempts to import files from unexpected sites
- lock down the configuration of a virtual machine to make it harder for an attacker to obtain their exploit
- use a type 1 hypervisor opposed to a type 2 (see Appendix A for definitions), limiting an attacker to vulnerabilities in the hypervisor itself and not the OS it is running on
- use hardware-supported virtualisation opposed to software-only virtualisation (see Appendix A for definitions), removing one attack surface of the virtualisation software
Prevent the running of the exploit
An attacker who successfully deploys an exploit may still be prevented from actually running it.
- limit what actions users can carry out, such as by whitelisting software, or even forcing users to interact only with a specific piece of software
- lock down the configuration of the virtual machine to reduce an attacker’s ability to increase the range of actions available
- lock down the configuration of the virtualisation system itself to reduce the attack surface, making it more likely the attacker’s exploit requires features that have been disabled
Detect the attacker before the exploit is run
If an attacker is detected before they have run their exploit, they can be removed from the system.
- Monitor the system well enough to detect attempts to exploit it. How to do this, and what should be considered ‘well enough’, are too system-specific to be in the scope of this guidance, but it includes any regular system monitoring of the virtual machine as well as the virtualisation software.
Prevent further damage after the initial exploit
Even if an attacker successfully runs their exploit, they can be frustrated if further exploitation is prevented before they have achieved their aims.
- Monitor the system well enough to detect suspicious behaviour. This won’t help against a one-off denial-of-service attack, but may make it easier to prevent a repeat. As above, how to do this, and what should be considered ‘well enough’ (or, for that matter, ‘suspicious behaviour’), are too system-specific to be in the scope of this guidance.
Reduce damage by considering attacker’s aims
By understanding what damage would result if an attacker was successful, steps can be taken to reduce that damage. This may also allow inconvenient mitigations to be safely removed, as well as giving the risk owner a clearer idea of exactly what is at risk.
If the worst happens, and the system is exploited, the impact can be reduced by considering the attacker’s motivations.
If they simply want to cause embarrassment through a denial-of-service attack, this can be partially frustrated by a process of restoring the system quickly while preventing the attacker being able to regain access.
If they wish to steal data, this can be frustrated wholly or partially by reducing the value of the data to an attacker. This can involve encrypting it, limiting the data available, or replacing sensitive data with references. For example, it may be sufficient to include an anonymous reference to a person that can be used separately to look up their personal information, rather than to include the personal information itself within the data being processed.
The possibility of an attacker exploiting the virtualisation (either to access data, perform an onward attack, or simply perform a denial-of-service attack against the system) should be included in any disaster recovery planning.
- if unnecessary data is not present, an attacker cannot steal it
- encrypting or obfuscating data can also frustrate attempts to steal it
- being able to rapidly block an attack and recover will frustrate an attacker whose intention is to cause disruption rather than to steal data
- removing unnecessary network connections may prevent an attacker using the exploited system as a stepping stone to others
- avoid virtualising a highly-protected system with a system that’s less well protected (perhaps because it’s perceived as less attractive to an attacker), to stop an attacker using the virtualisation as a route to the attractive system
Assurance is about the confidence behind the security of a product. This is subtly different to the security itself; an unassured product is not necessarily insecure, any more than an assured product is necessarily secure.
A virtualisation product can be put though the CESG Foundation Grade scheme, which is a basic check that the product is well designed and has a sound approach to security.
It is important to understand the limits of software assurance. Virtualisation software is highly complex, of similar complexity to an operating system. Foundation Grade assurance will confirm that the overall design is without obvious flaws, and provide security operating procedures to reinforce the software’s security, but cannot confirm the software is without vulnerabilities. Virtualisation software is simply too complex for this to be achievable in a sensible period of time, if it is possible at all.
A virtualisation product that has been formally assured to Foundation Grade will minimally increase the risk of data compromise compared to a system based on physically separate computers, provided:
- it is used and administered in accordance with its security operating procedures
- it is only confronted with a threat appropriate to the grade of assurance
- it has no unpatched security vulnerabilities
If a virtualisation product lacks formal CESG software assurance, you can gain some assurance by asking the following questions:
- Who makes it, and how often do they patch it? It is easier to have confidence in a mainstream product that is widely-used and is supported by a major company or organisation, rather than a niche product from a small group whose support is ad hoc at best.
- How well do they handle vulnerabilities? A new feature is a potential source of new vulnerabilities, so patches that only fix vulnerabilities are better than patches containing a mixture of security fixes and new features. Similarly, there is more confidence in the security of a product that is patched quickly when a vulnerability is found, compared to one that is patched late or never.
- Has the product been through any form of non-CESG assurance? Other assurance schemes provide some insight into the security of a product, provided attention is paid to any caveats about that assurance.
The risk owner must decide what level of assurance is appropriate.
A specific deployment of a virtualisation product can also be examined through the CESG Tailored Assurance Scheme (CTAS), though this is only recommended for exceptional cases where the risk owner needs software assurance and Foundation Grade is not considered sufficient.
Appendix A: What is virtualisation?
Virtualisation is a technology that allows one physical computer to act like several computers ('Virtual Machines', or VMs). Virtualisation is not a new concept and has been used in mainframe systems for many years, but the maturing of products for the x86 platform has increased the demand. It is the basis of many cloud services, especially for IaaS (Infrastructure as a Service).
Traditionally, there are two forms of virtualisation: 'type 1' and 'type 2'. Type 1 is also known as 'bare metal' virtualisation (the virtualisation software runs directly on the hardware, the 'metal', taking the place of the usual operating system), while type 2 is also known as 'hosted' virtualisation (the virtualisation software runs as a program under a normal operating system). This isn't as clear-cut as it once was, with some products having qualities of both type 1 and type 2. Software containers, also known as ‘operating system virtualisation’, are similar to type 2 virtualisation, but with the operating system creating isolated user space instances that share kernels rather than full virtual machines. They typically require less overhead than full virtualisation, but at the cost of weaker isolation.
The core of the virtualisation software is called the 'hypervisor', also known as the ‘Virtual Machine Monitor’.
Prior to the introduction of hardware support for virtualisation, virtualisation software had to handle the separation of the virtual machine from the virtualisation software. This was complicated, because the x86 processor was not originally designed for virtualisation, and required on-the-fly patching of code to handle some of the x86 instructions. An alternative was to have an operating system in the virtual machine that could co-operate with the virtualisation software (called paravirtualisation). This could be much more efficient, but required a special version of the operating system.
In 2006, Intel and AMD introduced hardware support for x86 virtualisation, initially called VT-x (Intel) and AMD-V (AMD), and later extended by Intel with Extended Page Tables (EPT) and by AMD with Rapid Virtualisation Indexing (RVI). This made it simpler for virtualisation software to maintain the separation. From a security perspective, this is much better; although processors can have security vulnerabilities, in general simpler is better for security. The features have gradually been added to most processors produced by Intel and AMD, though unfortunately it is necessary to check the description of a particular model of processor to confirm this.
Appendix B: Network Virtualisation
Network virtualisation goes by a number of different names, including Software-Defined Networking (SDN) and Network Function Virtualisation (NFV), with each variation offering a different approach and different capabilities. What they all have in common is a layer of abstraction away from the physical network.
This can be very flexible and powerful. The risk is that, like virtualisation of the computer, virtualisation of the network may have exploitable security vulnerabilities. The flexibility and power also means greater complexity, increasing the likelihood of security vulnerabilities. Network virtualisation has not been studied as thoroughly as computer virtualisation, and there could be categories of vulnerabilities that haven’t been considered.
Many products only provide a similar level of separation to a VLAN (Virtual Local Area Network), in that they route traffic by tagging data streams and the separation relies on the software working correctly. This may be sufficient for some uses, though an error in the software or configuration could allow traffic to be misrouted.
The three principles from Virtualisation Principles are applicable to network virtualisation as well:
- network virtualisation is only a security factor when it is being used as a security barrier
- network virtualisation should not be used to separate security classifications
- network virtualisation does not replace dedicated administration and configuration control
It is up to the risk owner to determine what level of protection is needed. However, given that the risk is difficult to quantify at the moment, it would be prudent to treat these technologies more as administrative tools, and not to rely solely on them to provide a security barrier. For example, relying on them to separate an internal network from an external one means the system is only one security vulnerability away from the internal network becoming accessible externally.
A greater level of separation is possible by using VPNs (Virtual Private Networks) on top of a virtualised network. These maintain separation using encryption, so misrouted data is still protected; an attacker would need to break the encryption. However, encryption also brings possible performance issues, as well as increased administration if keys have to be managed.
It is possible to mix-and-match, so only the data streams that need the most protection use VPNs.
Appendix C: Entropy weaknesse in virtual machines
In this context, entropy refers to truly unpredictable data that is used for cryptography and similar security-critical applications. Ideally, this is taken from a dedicated hardware entropy source, but this is not always available. It is common for these sources to feed an entropy ‘pool’, which removes the direct link between the source of the entropy and its use, and allows entropy to be collected continuously so that it’s ready when the system needs it.
In the absence of a dedicated entropy source (such as is found on some Intel processors), entropy generation on an ordinary PC relies on the unpredictability of physical processes, such as by taking measurements of disk accesses using a high resolution timer, and only using the lowest few bits of each measurement. In a physical machine, these lowest bits are dominated by the unpredictable movement of the hard disk components, and so they are effectively random. This is a poor source of entropy if the disk is a solid-state disk (SSD), as these lack the unpredictable physical processes this technique relies upon.
Virtualisation removes the connection between the software and the physical device being relied upon to produce the entropy. In some cases, there will be a direct-enough connection that timings using high-resolution timers will still work; in other cases, however, the timings may be artificially synchronised, or there may no longer be any actual hardware. The problem is not so much that the entropy generation is known to be bad, but that it is difficult to have confidence that it is good.
If the entropy pool doesn’t produce the entropy expected of it, the result can be weak cryptographic keys. Weak keys can allow an attacker to gain access to the system or its data, so this needs to be considered among the risks added by virtualisation.
- If more confidence in the entropy is needed, a hardware entropy source should be used. Because entropy is additive, the hardware entropy generator can be used as an input to the existing entropy pool rather than used to create keys directly.
- An alternative approach that is appropriate in some scenarios is to move the entropy generation off the virtualised platform and onto dedicated hardware. For example, a TLS concentrator can be used to terminate a TLS connection prior to passing the data on to virtual machines to handle it.