Network Optimizations
Network-wise, the optimizations possible are generally pretty straightforward and depend a lot on the capabilities of the NICs that are used. To get the most out of ESX, VMware recommends using a network adapter that supports the following technologies:
- Checksum offload, TCP segmentation offload, and Jumbo Frames (we'll get into this a bit deeper below): These can be checked for in Windows through the NIC's properties, and in Linux with the use of ethtool.
- Capability to handle high memory DMA (64-bit DMA addresses)
As in any clustered system, try providing separate networks to avoid traffic contention between different kinds of traffic like inter-VM communication, management traffic, storage traffic, etc.
Why would you want to use Jumbo Frames?
Rather than speeding up a network, Jumbo Frames are actually used to reduce the load on the CPU by reducing the amount of interrupts caused by the network interface during continuous traffic. Instead of the standard size of 1500 bytes, a Jumbo Frame can contain 9000 bytes. Because more data can be sent in a single frame, the overall handling costs of traffic go down, and time is freed on the CPU to spend on more exciting tasks.
Configuring a network for the use of Jumbo Frames is a different story altogether, however, because it deviates from the standard settings of a network and requires changes on every single layer of the network in question. In Linux, this is achieved on both the sender and the receiver by issuing the command: "ifconfig eth# mtu 9000" (# stands for the network adapter's number). On Windows, it needs to be set in the advanced properties of the network adapter (Device Manager > Network adapters > VMware PCI Ethernet Adapter > MTU to 9000).
In ESX's own VirtualSwitch, this setting can be changed by typing the following command in the service console: "esxcfg-vswitch -m 9000 vSwitchName". After that, it of course needs to be set on every router or switch on the way to the receiving machine before you're able to take advantage of this functionality.
We must note here that we've found a frame size of 4000 to be optimal for iSCSI, because this allows the blocks to be sent through without being spread of separate frames. Additionally, vSphere adding in support for Jumbo Frames with NFS and iSCSI on both 1Gb and 10Gb NICs allows for a large performance jump when deciding to use the latter. According to VMware, it should be possible to get 10 times as much I/O throughput with a 10Gb NIC.
What else is coming up?
Coming up in a couple of days, we will be adding the second part of this article, diving into aspects such as storage, software configuration, and scheduling mechanics. We are excited to be able to share our experiences with the platform and hope this article allows AnandTech-reading IT administrators to get the edge on their colleagues by bringing out the very best ESX has to offer. Now it's time to get back to work on putting together the second part; expect it in the course of this week.
10 Comments
View All Comments
najames - Tuesday, June 23, 2009 - link
This is perfect timing for stuff going on at work. I'd like to see part 2 of this article.- Wednesday, June 17, 2009 - link
I am very curious how vmware effects timings during the logging of streaming data. Is there a chance that some light could be shed on this topic?I would like to use vmware to create a clean platform in which to collect data within. I am, however, very skeptical about how this is going to change the processing of the data (especially in regards to timings).
Thanks for any help in advance.
KMaCjapan - Wednesday, June 17, 2009 - link
Hello. First off I wanted to say I enjoyed this write up. For those out there looking for further information on this subject VMware recently released approximately 30 sessions from VMworld 2008 and VMworld Europe 2009 to the public free of charge, you just need to sign up for an account to access the information.The following website lists all of the available sessions
http://vsphere-land.com/news/select-vmworld-sessio...
and the next site is the direct link to VMworld 2008 ESX Server Best Practices and Performance. It is approximately a 1 hour session.
http://www.vmworld.com/docs/DOC-2380">http://www.vmworld.com/docs/DOC-2380
Enjoy.
Cheers
K-MaC
yknott - Tuesday, June 16, 2009 - link
Great writeup Liz. There's one more major setup issue that I've run into numerous times during my ESX installations.It has to do with IRQ sharing causing numerous interrupts on CPU0. Basically, ESX handles all interrupts (network, storage etc) on CPU0 instead of spreading them out to all CPUS. If there is IRQ sharing, this can peg CPU0 and cause major performance issues. I've seen 20% performance degradation due to this issue. For me, the way to solve this has been to disable the usb-uhci driver in the Console OS.
You can find out more about this issue here:http://www.tuxyturvy.com/blog/index.php?/archives/...">http://www.tuxyturvy.com/blog/index.php...ng-VMwar...
and http://kb.vmware.com/selfservice/microsites/search...">http://kb.vmware.com/selfservice/micros...&cmd...
This may not be an issue on "homebuilt" servers, but it's definitely cropped up for me on all HP servers and a number of IBM x series servers as well.
LizVD - Wednesday, June 17, 2009 - link
Thanks for that tip, yknott, I'll look into including that in the article after researching it a bit more!badnews - Tuesday, June 16, 2009 - link
Nice article, but can we get some open-source love too? :-)For instance, I would love to see an article that compares the performance of say ESX vs open-source technologies like Xen, KVM! Also, how about para-virtualised guests. If you are targeting performance (as I think most AT readers are) I would be interested what sort of platforms are best placed to handle them.
And how about some I/O comparisons? Alright the CPU makes a difference, but how about RAID-10 SATA/SAS vs RAID-1 SSD on multiple VMs?
LizVD - Wednesday, June 17, 2009 - link
We are actually working on a completely Open-Source version of our vApus Mark bench, and to give it a proper testdrive, we're using it to compare OpenVZ and Xen performance, which my next article will be about (after part 2 of this one comes out).I realize we've been "neglecting" the open source side of the story a bit, so that is the first thing I am looking into now. Hopefully I can include KVM in that equation as well.
Thanks alot for your feedback!
Gasaraki88 - Tuesday, June 16, 2009 - link
Thanks for this article. As an ESX admin, this is very informative.mlambert - Tuesday, June 16, 2009 - link
We must note here that we've found a frame size of 4000 to be optimal for iSCSI, because this allows the blocks to be sent through without being spread of separate frames.Can you post testing & results for this? Also would be interesting to know if 9000 was optimal for NFS datastores (as NFS is where most smart shops are using anyways...).
Lord 666 - Tuesday, June 16, 2009 - link
Excellent write up.