The author makes this point, but in my opinion not strong enough.
While it’s tempting to say that the way to achieve this is to run everything on VMs, I’m not convinced that adding another layer of abstraction (as well as more processes running on the host OS) is going to lead to more consistent results. Because of this, dedicated hardware is best. Failing that, just run all of the tests you can in one session, and make it clear that comparisons between different sessions don’t work.
As someone with pretty in-depth knowledge of how a VMM (virtual machine monitor) works on a very low level, using a VM is a truly awful idea. VM exit incidence is pretty much completely unpredictable and can have a huge impact on performance. Not only that, but the behavior of some cases in hardware VMM layers can cause pretty much complete TLB wipes, thus destroying cache coherency and any memory optimizations the web server takes advantage of.
Now, if your goal is measuring performance of web servers on VMMs, go ahead, but be aware that performance consistency is not a plus.
> Not only that, but the behavior of some cases in hardware VMM layers can cause pretty much complete TLB wipes, thus destroying cache coherency and any memory optimizations the web server takes advantage of.
It is certainly true that VMs can have an impact on webserver performance, but the biggest effects are on performance under saturation, rather than peak performance. In my experience, the mean latency of serving typical dynamic applications from a VM isn't significantly higher on a VM than on similarly powered native hardware. This picture changes at high percentiles (the latency of the top 0.1% of requests, for example) but is certainly not catastrophic for the vast majority of workloads.
Of course, if you are running on a soft-scheduled VM on a busy box, or a very small hard-scheduled slice of hardware, then you are going to get bad performance.
But what concrete difference does this have on the price/performance ratio? It's fairly clear from the experiences of several big web companies that the cost of running web servers on a VM is well worth paying in exchange for the lowered costs of maintenance and system management. A good virtualization infrastructure can significantly reduce whole-system TCO for many workloads.
As for the testing infrastructure, the ability to spin up and down test clusters on services like Amazon EC2 is a huge win. The sizes of the instances (slices/vms/etc) should be chosen to minimize the effect of the client VM, but you don't have to go far to remove the measurement effect of running on a VM in most cases.
I'm sure he meant testing performance of a VM is tricky, but in actual fact, both are true. Using a VM for the test client is just as susceptible to the issues mentioned as running it as the server. As the article suggests: know your load generator client. This is difficult if it's not consistent.
While it’s tempting to say that the way to achieve this is to run everything on VMs, I’m not convinced that adding another layer of abstraction (as well as more processes running on the host OS) is going to lead to more consistent results. Because of this, dedicated hardware is best. Failing that, just run all of the tests you can in one session, and make it clear that comparisons between different sessions don’t work.
As someone with pretty in-depth knowledge of how a VMM (virtual machine monitor) works on a very low level, using a VM is a truly awful idea. VM exit incidence is pretty much completely unpredictable and can have a huge impact on performance. Not only that, but the behavior of some cases in hardware VMM layers can cause pretty much complete TLB wipes, thus destroying cache coherency and any memory optimizations the web server takes advantage of.
Now, if your goal is measuring performance of web servers on VMMs, go ahead, but be aware that performance consistency is not a plus.