Introduction
At HAProxy Technologies, we edit and sell a hardware and virtual load balancer called ALOHA (which stands for Application Layer Optimisation and High-Availability). A few months ago, we managed to make it run on the most common hypervisors available:
- VMWare (ESX, vsphere)
- Citrix XenServer
- HyperV
- Xen OpenSource
- KVM
< ADVERTISEMENT>So whatever your hypervisor is, you can run an ALOHA on top of it 🙂 </ADVERTISEMENT>
Since a Load-Balancer appliance is Network IO intensive, we thought it was a good opportunity to bench each Hypervisor from a virtual network performance point of view.
Well, more and more companies use Virtualization in their infrastructures, so we guessed that a lot of people would be interested by the results of this bench, that’s why we decided to publish them on our blog.
Things to bear in mind about virtualization
One of the interesting feature of Virtualization is to be able to consolidate several servers onto a single hardware.
As a consequence, the resources (CPU, memory, disk and network IOs) are shared between several virtual machines.
An other issue to take into account is that the Hypervisor is like a new “layer” between the hardware and the OS inside the VM, which means that it may have an impact on the performance.
Purpose of benchmarking Hypervisors
First of all: WE ARE TOTALLY NEUTRAL AND HAVE NO INTEREST SAYING GOOD OR BAD THINGS ABOUT ANY HYPERVISORS.
Our main goal here is to check if each Hypervisor performs well enough to allow us to sell our Virtual Appliance on top of it.
From the tests we’ll run, we want to be able to measure the impact of the Hypervisor on the Virtual Machine performance
Benchmark platform and procedure
To run these tests, we use the same server for all Hypervisors, just swapping the hard-drive, to run each hypervisor independently.
Hypervisor Hardware summarized below:
- CPU quad core i7 @3.4GHz
- 16G of memory
- Network card 1G copper e1000e
NOTE: we benched some other network cards and we got UGLY results. (Cf conclusions)
NOTE: there is a single VM running on the hypervisor: The Aloha.
The Aloha Virtual Appliance used is the Aloha VA 4.2.5 with 1G of memory and 2 vCPUs.
The client and WWW servers are physical machines plugged on the same LAN than the Hypervisor.
The client tool is injected and the webserver behind the Aloha VA is httpterm.
So basically, the only thing that will change during these tests is the Hypervisor.
The Aloha is configured in reverse-proxy mode (using HAProxy) between the client and the server, load-balancing and analyzing HTTP requests.
We focused mainly on virtual networking performance: the number of HTTP connections per second and associated bandwidth.
We ran the benchmark with different object sizes: 0, 1K, 2K, 4K, 8K, 16K, 32K, 48K, 64K.
NOTE: by “HTTP connection”, we mean a single HTTP request with its response over a single TCP connection, like in HTTP/1.0.
Basically, the 0K object test is used to get the number of connections per second the VA can do and the 64K object is used to measure the maximum bandwidth.
NOTE: the maximum bandwidth will be 1G anyway since we’re limited by the physical NIC.
We are going to bench Network IO only since this is the intensive usage a load-balancer does.
We won’t bench disks IOs…
Tested Hypervisors
We benched a native Aloha against Aloha VA embedded in the Hypervisors listed below:
- HyperV
- RHEV (KVM based)
- vshpere 5.0
- Xen 4.1 on Ubuntu 11.10
- XenServer 6.0
Benchmark results
Raw server performance (native tests, without any hypervisor)
For the first test, we run the Aloha on the server itself without any Hypervisor.
That way, we’ll have some figures on the capacity of the server itself. We’ll use those numbers later in the article to compare the impact of each Hypervisor on performance.
Microsoft HyperV
We tested HyperV on a Windows 2008 r2 server.
For this hypervisor 2 network cards are available:
- Legacy network adapter: which emulates the network layer through the tulip driver.
==> With this driver, we got around 1.5K requests per second, which is really poor… - Network adapter: requires the hv_netvsc driver supplied by Microsoft in open source since Linux Kernel 2.6.32.
==> this is the driver we used for the tests
RHEV 3.0 Beta (KVM based)
RHEV is Red Hat Hypervisor, based on KVM.
For the Virtualization of the Network Layer, RHEV uses the virtio drivers.
Note that RHEV was still in the Beta version when running this test.
VMWare Vsphere 5
There are 3 types of network cards available for Vsphere 5.0
1. Intel e1000: e1000 driver, emulates network layer into the VM.
2. VMxNET 2: allows network layer virtualization
3. VMxNET 3: allows network layer virtualization
The best results were obtained with the vmxnet2 driver.
Note: We have not tested Vsphere 4 either ESX 3.5.
Xen OpenSource 4.1 on Ubuntu 11.10
Since CentOS 6.0 does not provide Xen OpenSource in its official repositories, we decided to use the latest (Oneiric Ocelot) Ubuntu server distribution, with Xen 4.1 on top of it.
Xen provides two network interfaces:
- emulated one, based on 8139too driver
- the virtualized network layer, xen-vnif
Of course, the results are much better with xen-vnif, so we’re going to use this driver for the test.
Citrix Xenserver 6.0
The network driver used for XenServer is the same one as the Xen OpenSource: xen-vnif.
Hypervisors comparison
HTTP connections per second
The graph below summarizes the HTTP connections per second capacity for each Hypervisor.
It shows us the Hypervisor overhead by comparing the light blue line, which represents the server capacity without any Hypervisor to each hypervisor line..
Bandwidth usage
The graph below summarizes the HTTP connections per second capacity for each Hypervisor.
It shows us the Hypervisor overhead by comparing the light blue line which represents the server capacity without any Hypervisor to each hypervisor line.
Performance loss
Well, comparing Hypervisors to each other is nice, but remember, we wanted to know how much performance was lost in the hypervisor layer.
The graph below shows, in percentage, the loss generated by each hypervisor when compared to the native Aloha.
The highest the percentage, the worst for the hypervisor…
Conclusion
- the Hypervisor layer has a non-negligible impact on networking performance on a Virtualized Load-Balancer running in reverse-proxy mode.
But I guess it would be the same for any VM which is Networking IO intensive - The shortest the connections, the biggest the impact is.
For very long connections like TSE, IMAP, etc… virtualization might make sense - Vsphere seems in advanced compared to its competitors from a performance point of view.
- HyperV and Citrix XenServer have interesting performances.
- RHEV (KVM) and Xen open source can still improve performance.
Unless this is related to our procedure. - Even if the hardware layer is not accessed by the VM anymore, it still has a huge impact on performance.
IE, on vsphere, we could not go higher than 20K connections per second with a Realtek NIC in the server…
With the Intel e1000e driver, we got up to 55K connections per second…
So, even when you use virtualization, hardware counts!
Interesting results. I’m wondering if the difference in results between XenServer 6.0 and Ubuntu Xen could be because of the virtual network layer. XenServer 6.0 will use openvswitch by default; Ubuntu will probably use Linux bridging; I suspect RHEV will probably use Linux bridging as well. Have you tried switching XS 6.0 to linux bridging, or switching Ubuntu Xen to openvswitch?
Well, we did not change any settings on the OS installed.
Our purpose was not to get the most requests from each Hypervisor, but to have an idea of the capacity of each of them.
So we didn’t spent too much time trying to tune each of them.
Concerning Xen/Ubuntu, we just stopped iptables and removed associated conntrack modules, concerning XenServer, we just stopped the ovs-vswitchd daemon which took all the resources and gave very bad performance for this Hypervisor.
We’re open to any tips which can improve performance on any of those Hypervisors, so I’ll run the test you propose and give you a feedback soon.
If some tuning can improve drastically performance, I’ll share them on the blog too.
Interesting bench and results.
In the xen part, you mention two cards, one being emulation. I take your VM is using HVM ? HVM is a “known” slow virtualisation. It exists to be able to emulate anything directly, but it emulates a system.
If it really is HVM, could you also try to boot your OS using paravirt ?
I’m not sure if it would change much, or if you are already using paravirt, but it would really be nice to be sure, and have both XEN/HVM and XEN/paravirt results.
Hi ze,
thanks for the information.
I’ll do the test and report any improvement, or not 🙂
cheers
Hi,
I just ran the test with openvswtich, and it’s worst than simply bridging.
The ovs-vswtichd process takes all the ressources of one core, netback process takes “only” 60% of an other core.
The Virtual Appliance runs at 25% of CPU (most in userland for HAProxy, very few in system interrupts) and, with empty objects, I can only have 8000 req/s.
cheers
HVM doesn’t necessarily mean “slow”. Xen HVM is actually faster than PV for some workloads (for example 64bit guests spawning a lot of new processes). The key for good HVM performance is to use Xen PVHVM drivers, so disk/net IO is using the Xen fastpath bypassing all emulation, and also new enough CPU with HAP (Hardware Assisted Paging, Intel EPT or AMD NPT).
Xen HVM is faster than PV in the following scenarios:
– 64bit VM and the application in the VM is doing a lot of syscalls. x86_64 architecture (as defined by AMD back in the days) is limited and doesn’t have enough available ring levels, so the kernel userspace protection in the VM causes a trap to hypervisor in 64 bit PV domUs. HVM doesn’t have have this limitation, so HVM is faster with 64bit VMs. If the guest VM is 32bit, then PV domUs are as fast, because 32bit x86 architecture *does* have enough ring levels for native kernel userspace protection inside the VM.
– If the application in the VM is creating (forking) a lot of new processes, then HVM will be faster than PV. HVM guests don’t have to trap to hypervisor for page table validation for the new processes. PV is good and fast for long-running processes, but when creating new processes the new process page tables need to be verified by the hypervisor, and this has some overhead, which will be visible for example in “benchmarks” like kernel compilation where a lot of new gcc processes will be started all the time.
I’m aware of at least those two scenarios, and it always depends on the workload. For more information and some Xen PV vs. Xen HVM vs. KVM benchmarks are available in these XenSummit 2011 slides: http://xen.org/files/xensummit_santaclara11/aug3/6_StefanoS_PVHVM.pdf .
Hi ze,
So basically, we can only run in HVM.
We can’t provide a kernel per hypervisor, we build an appliance.
Note that we’re benching network IOs and we already use the para-virtualized network driver in the appliance. The result were very ugly in a fully emulated environment.
Today, I run the test again and I saw that all the resources are taken by a netback process (when using a bridge between the VM and the dom0), while the appliance is at less than 50% of CPU. Which means we can probably go much higher on Xen.
I’m going to test openvswitch and see if it brings any improvement.
Cheers
You happen to get really poor performance w/ KVM although other benchamrks (the official specVirt reveals it has top performance). What type of OS do you use for your guest? Are all devices are pv (virtio-net and virtio-blk)? What cpu model have you chosen? please provide more info.
I am surprised by the KVM findings. I would be happy to help dig into the issues with you to get them resolved.
Hi,
I just sent you a mail.
Let’s speak about the results directly.
cheers
Some comments about the Ubuntu 11.10 Xen 4.1 benchmark results:
– Did you try using “dom0_max_vcpus=x dom0_vcpus_pin” xen hypervisor cmdline arguments to pin/dedicate certain cores for only dom0 ? together with setting up vcpu pinning for the VM aswell to use different pcpus than dom0.
– Did you set up Xen credit scheduler domain weights? It’s usually a good idea to give dom0 more weight than the VMs so dom0 is guaranteed to always have enough CPU time to serve the disk/net backends for the VMS properly. http://wiki.xen.org/wiki/Xen_Best_Practices .
– Ubuntu 11.10 (dom0) kernel doesn’t include the Xen ACPI power management and cpufreq patches. Those patches are merged to upstream Linux 3.4, but they’re not yet in the kernel used in Ubuntu 11.10. The patches aren’t in Ubuntu 12.04 GA kernel either, but they might (should) end up in later updated packages. These patches are needed to be able to utilize the better performing CPU states, and also to minimize power usage on idle systems.
– Also the Linux upstream kernel doesn’t yet have the optimized xen-netback dom0 network backend driver. Work is in progress to merge the optimization patches in upcoming Linux versions. Currently if one wants to use “up-to-date” distro with optimized Xen dom0 kernel it’s best to use SLES11 or OpenSuse, because those distros are using the “traditional” out-of-tree optimized Xenlinux patches. also Citrix XenServer 6 is using these optimized xenlinux patches, and that (partly) explains the performance difference above in the benchmark compared to Ubuntu 11.10 Xen4.1. Upstream Linux is getting there, but it takes some time to get “everything” merged in small steps, as required by the Linux kernel development model.
And thanks for the benchmark! It’s interesting.
Hi Pasi,
Thanks a lot for both comments. They’re very useful and helpful.
When I’ll prepare a round 2, I’ll introduce OpenSuse as dom0 for Xen.
Cheers
Great! In addition to the “existing” netback optimizations being ported and merged to upstream Linux, there’s also completely new work going on to make the Xen netback/netfront networking much more scalable and utilizing multiple cores more efficiently. Patches have been posted to xen-devel, but it’s still a work-in-progress.
In the land of virtualization there’s always something happening to keep things interesting 🙂
Good comparison,
Clearly shows the problems with N/w Performance with hypervisors. Did you try with an SR-IOV Compatible card and supported hypervisor using vt-d.
It will be interesting to see how it performed on Amazon EC2, using VM import feature . I bet they will give nearly native performance. They use Xen on background but not sure what they do to overcome the n/w performance bottleneck.
Hi,
Unfortunately, we did not have such hardware to run the benchmark.
Note we may do a second round soon with a newer kernel in the ALOHA, including Hyper2012 and more recent KVM/Xen version as well.
Stay tunned.
Baptiste