Below are some of the technical reasons why oversizing a VM could adversely affect performance…
CPU Ready Time: This is a vSphere metric that records the amount of time a virtual machine is ready to use CPU but was unable to schedule time because all CPU resources (on a physical host) are busy. Unlike physical servers, too many vCPUs could adversely affect performance of the VM and its applications. Because VMs share physical hosts and vCPU requires scheduling against physical CPU, the VM then has to wait for all those CPUs to be available, meaning longer CPU “ready time” waits. Large VMs are also bad for other VMs, not just for themselves. They can impact other VMs, large or small. ESXi scheduler has to find available cores for all the vCPUs, even though they are idle (even if not all vCPU is used by the application, the Guest OS will still demand all the vCPUs be provided by the hypervisor).
vMotion: Enables the live migration of running virtual machines from one physical server to another. vMotions can be manually or automatically performed. DRS dynamically moves VMs between hosts members of the DRS cluster to balance resource usage. Large memory VMs are more prone to cause issues/delays during vMotion operations as the entire memory content has to be transferred to the new host. Large memory VMs generate a lot of checkpoints and memory delta files that repetitively keep copying to the target VM until the next set of memory change is small enough to: 1) stop the vCPUs of the source VM, 2) copy the last modifications of the memory to the target VM, 3) discontinue disk access on the source and start it on the target, and finally 4) start the vCPUs on the target VM. Also, the more vCPUs, the longer the wait until all those CPUs are available on the target host, resulting sometimes in short outages. So the bigger the VM, the longer it takes to do vMotion. This is important because there's a VMware feature called “Stun During Page Send” (SDPS) that intentionally slows down the vCPUs during vMotion operations to ensure that the memory modification rate is slower than the copy transfer rate and thus prevent any possible vMotion failures.
Snapshots. Longer to snapshot, especially if memory snapshot is included. We’ve seen cases with large VMs where creation or deletion of snapshots takes hours or even days to complete (especially when they are kept for several days).
vNUMA – Unlike legacy UMA architecture where all the processors share the same physical memory uniformly, NUMA (Non uniform memory architecture) works by linking memory directly to a processor to create a “NUMA node”. Here, all processors have access to all memory, but that memory access is not uniform or equal. Each processor has direct access to its own memory, this is known as local memory. It can also access memory assigned to the other processor, this is known as remote memory. Access to remote memory is significantly slower than local memory (hence the memory access is non-uniform). VMware introduced vNUMA (Virtual Non-Uniform Memory Access) in vSphere 5. It is a technology feature that exposes the underlying NUMA architecture of the hypervisor to the VMs running on it. VMware recommends sizing VMs so they align with physical NUMA boundaries. For example, if the physical host has 8 cores per socket, the NUMA node is assumed to contain 8 cores. If VMs were sized incorrectly and did not match the underlying NUMA topology it would result in reduced performance. So in our example where the hypervisor has 8 cores per socket, if a VM with 10 vCPUs were created, it would breach the NUMA boundary. 8 of the cores would come from an assigned NUMA node and 2 would come from another NUMA node. This would result in reduced performance as applications would be forced to access remote memory thereby incurring a performance hit.
Some recommendations for NUMA optimization:
• Assign less or equal amount of vCPU´s to VMs than the total number of physical cores of a single CPU Socket (stay within 1 NUMA Node). Don´t count Hyperthreading!
• Check your virtual infrastructure in general to work optimized with the physical NUMA node limits of your servers. Watch for Monster-VMs!
• Avoid single VM consuming more vCPUs than the single NUMA node or it might be scheduled across multiple NUMA nodes causing memory access degradation.
• Avoid single or multiple VMs consuming more RAM than a single NUMA node, because it will cause the VMkernel to span a percentage of the memory content in the remote NUMA node resulting in reduced performance.
In summary, over-provisioning virtual machines is a common malpractice and P2V (physical to virtual) is a common reason, as the VM was simply matched to the physical server size. Another reason is conservative sizing by vendor or developers. The recommendation is always to start with minimum amount of resources, increase as needed, and compare results.