My approach to best practices is
that they are useful guides to getting to where you want to go (as are
books). You should always know the technology well enough to consider
whether the best practice still applies or if you need to make
adjustments. As an example, consider the concept of running vCenter as
a virtual machine. In the history of VMware virtual infrastructure, the
notion of running Virtual Center/vCenter as a virtual machine was a
hotly contested topic (it may still be). VMware does consider it a best
practice to run vCenter as a VM but not on hosts it is managing. I have
worked with a few customers who insist that the management function be
deployed on a physical machine. Rather than rely on a best practice,
perhaps you need to understand the benefits versus risks from taking
one approach over the other and adjust your design accordingly.
As consultants, we are often asked to put
forth a design that meets the business requirements and provides the
level of performance, availability, and scale required. The ideal
approach to developing a design is to perform a capacity planning
exercise to ensure that the hardware and software can be properly
estimated to run the virtual desktop workload. Capacity planning is
quite common in server virtualization environments but not as common in
virtual desktop planning, although it is recommended. A number of tools
are specifically designed for virtual desktop analysis, such as
Lakeside Software’s SysTrack VP tool. SysTrack VP is a tool that is
designed to provide information to help in planning your virtual
desktop environment, such as inventorying the software in your desktop
environment. It is agent based and allows you to take the collected
data and model the configuration of the hosts by adjusting CPU and
memory values and determining how many virtual desktop images you are likely to need. You can find additional information on the product at http://www.lakesidesoftware.com.
Understanding the configuration of the hosts
and the number of images allows you to calculate the cost of the
solution and is a key input in developing the ROI and TCO. To calculate
ROI, you simply take the gain of an investment, subtract the cost of
the investment, and divide the total by the cost of the investment. Or
ROI = (Gains – Cost)/Cost
Because it is important to understand the ROI
when presenting the business case for virtual desktops, it is a good
idea to calculate the ROI even if you need to estimate the gains. Keep
in mind that gains can include hard cost savings such as the difference
between thin clients and physical desktops and soft cost savings such as reducing the cost of desktop support.
Infrastructure Introduction
When you are considering a large deployment
of VMware View, it is best to follow all the steps in developing valid
hardware estimates. These steps are as follows:
1. Develop a baseline
of current utilization in the desktop environment. The physical desktop
baseline should be viewed as a starting point because the inclusion of
many additional technologies such as View Composer often provides
higher consolidation of features such as images in the virtual desktop
environment versus the physical one.
Initially, you used the same set of
tools to perform virtual desktop assessments that you used in server
consolidation exercises. Over time better tools were developed that now
provide not just capacity planning information, but application
inventory and license compliance. These tools can also assess whether
or not an application is a candidate to be virtualized by ThinApp. It’s
ironic that with the exception of the application virtualization piece
this is exactly the information you would need if you were planning a
physical desktop migration in a large environment.
2. Estimate the
hardware required to build a limited scale or proof of concept (PoC) to
validate what features you will make use of in the VMware View platform
and your hardware specifications (this should include not just servers
but also storage space and throughput information). The PoC should also
consider user segmentation or the types of users in an organization,
such as knowledge and administrative users. To provide a viable
reference for the production deployment, the PoC should include a
proper variety of user segments.
3. Develop a production architecture and migration plan.
Although this approach is
ideal, it is not the only one. Often virtual desktop engagements begin
with limited-scale proof of concept environments versus a capacity
planning exercise. A PoC, if properly designed, can be a great way of
gathering information on what the “real” or representative workload
will be for your production virtual desktop environment. By looking at
the performance utilization within the PoC, you are able to extrapolate
what is required to build out the production environment. You should
baseline the information related to CPU, memory, and storage, including
I/O.
The storage I/O information is very important
and can be difficult to get a handle on. If you are dealing with a
storage vendor and that vender makes a distinction between virtual
desktop environments and server virtualization environments, it often
has general sizing numbers to develop throughput specifications for
Virtual Desktop Infrastructure (VDI) environments. What is unusual
about virtual desktop environments is that two very different disk I/O
conditions exist: burst and operational I/O. Burst I/O is more common
in VDI environments because operational requirements necessitate large
reboots of desktop operating systems not typical in virtual server
environments. Operational I/O can also be problematic if factors such
as virus scanning activities are synchronized based on time versus
randomized to reduce the performance hit on the VMs. Even if you are
careful in randomizing the activity, often AV scans follow very
specific patterns. In a physical desktop world, this is minimal; in a
virtualized environment, it can have a substantial impact.
Some storage vendors have a very utilitarian
view of storage services; they do not view virtual desktop workloads as
any more unique than other virtual workloads. The limitation with SAN
vendors who do not differentiate between server and desktop
virtualization environments is that to guarantee good throughput, you
may have to consider their enterprise class storage systems for good
performance.
Other storage vendors provide midtier
solutions and solid state drives to deal with burst I/O. Although this
approach is better, it still requires you to adjust your design so that
high I/O requirements are segregated onto volumes made up of
solid-state drives (SSDs). This leads to a very static design in which
you may or may not make good use of high-performance drives. A growing
number of options are available for I/O offload, such as cache cards
(for example, Fusion IO; http://www.fusionio.com) or memory-based virtual appliance proxies for consolidating and dealing with I/O (such as Atlantis Computing’s ILIO product; http://www.atlantiscomputing.com).
In addition, storage vendors have designed solutions for virtualization
consolidation and more specifically around the high I/O of virtual
desktop workloads.
Most recently, storage vendors have started
to build midtier storage systems that have some of the features of
enterprise class systems such as dynamic tiering. Dynamic tiering is the capability to move hot data, or data that is in demand, to high-performance drives so
that the SAN delivers great performance. This activity can typically be
done on the fly or scheduled to happen periodically during the day.
These solutions are ideal for virtual desktop environments because they
do not require the premium of enterprise class storage systems but
still deliver the features. EMC has clearly targeted the VNX line to
provide features that make them ideally suited for virtual workloads.
Of course, companies such as NetApp have been using programmable
acceleration module (PAM) cards for years to deal with burst I/O.
Whichever solution you select, here are a few general considerations for putting together your design.
The difference between SAN solutions designed
specifically for virtualization consolidation and some of the I/O
offload products is in their application although they can be used to
complement each other. If you are building a large virtual desktop
environment and you have the option of architecting a dedicated SAN,
you can plan for high I/O conditions. If you are integrating into a SAN
framework shared across the entire organization, you know you may have
to offload or boost the I/O provided.
Each SAN vendor has very different numbers
when estimating I/Os for virtual/virtual desktop workloads. It is best
to have your own reference numbers based on internal testing. Use these
numbers to make sure the estimates provided meet your requirements.
Burst I/O and operational I/O are treated
distinctly by most storage vendors. For example, if your numbers
estimate that your environment may generate 15,000 burst I/Os and
require 4 TB of storage, the vendor may suggest 6 X SSD drives (6 ×
2500 IOs each = 15K burst, excluding RAID considerations) and
approximately 12 of the 450 GB SAS drives to meet your operational I/O
and total storage capacity. In this way, I/O and storage capacity are
treated distinctly by the configuration.
Ensure that your virtual desktop design
incorporates the SAN environment. A good design should provide
consistent performance over the lifetime of the solution (typically
three years). Achieving this result is not possible if you build a
great VDI design that does not set specific requirements for storage.
Although your VDI environment may run great during the first year, you
may see high SAN utilization lead to problems over time.
Separate your expected read and write I/Os.
Take the number of writes and ensure you factor the number by 4 to
allow for an I/O penalty on writes. For example, if you expect 2000
reads and 2000 writes, multiply the writes by 4 for a total of 10,000
expected I/Os (2000 read I/O + 8000 write I/O).
One of the unique features of ESXi is the
capability to use local SSDs. If you combine this capability to use
local SSDs and incorporate it in your design, you can heavily subsidize
your I/O requirement for storage. Doing so requires a little more
consideration because local SSD drive partitions are not shared between
ESXi hosts as SAN storage is. Because these virtual desktops would be localized, you would have to ensure that any data is nonpersistent in nature.
The design of VMware View can change
dramatically because of the support of SSD drives in vSphere. Where
before you spent a lot of time ensuring that the storage provided
adequate throughput, now you have the option of also designing
nonpersistent or floating VMs on localized SSD drives.
By factoring in both local and SAN options,
you can reduce the overall price per desktop. This amount can be
considerable depending on the percentage of persistent versus
nonpersistent or floating desktops. SSDs change the framework
considerably because they can provide incredible read I/O performance
and impressive write performance. Although different benchmarking
produces a variety of different results, it is not uncommon for SSD
drives to deliver 25,000–30,000 read I/Os and 4,000–5,000 write I/Os.
The only drawback with SSD drives is that they are still relatively
expensive and still have a limited amount of storage space although
this situation gets better and better every year. As of the time of
this writing, an SSD with 600 GB of space is available.
VMware provides reference architecture for
stateless virtual desktops in which they use SSD drives. It is not
possible to apply this reference architecture as is to production,
however, because most environments consist of both stateful and
stateless virtual desktops. Using local SSDs is an option in vSphere 5
but does require some additional planning in your View architecture
because you will have components of the virtual desktop environments
configured on local SSDs, as shown in Figure 1.
Figure 1. Using local SSDs is possible in vSphere 5.
You would use local SSD
drives for stateless or nonpersistent desktops and fan out the number
of desktops to reduce the overall risk in a production deployment.
Persistent desktops (stateful) and any critical components would reside
on the SAN, and the local SSDs would be used for low-storage high-I/O
desktops like those provided through View Composer. This design, while
possible, is not all that common because most SAN solutions now
incorporate SSDs. The trade-off, however, is that at a certain scale
one is likely to be more cost effective than the other.
Even with the best underlying measurements,
you should always factor in the usage type of users consuming the
virtual desktop environment. Generally speaking, usage type falls into
three broad categories: low-, medium-, and high-end users. The point in
planning for these broad categories of users is to make allowances in
the hardware specifications. For example, say that from your PoC
environment, you identify that most virtual desktop sessions are using
about 2 GB of memory and a single vCPU with a 30 GB OS image. Rather
than plan on the average, you should adjust the average with the usage
types mentioned.
Taking an example, say that the production
environment will service 500 desktops. As the IT architect for the
company, you know that a large percentage of these desktops will go to
engineers and designers, so out of the 500 seats you expect that 40% of
those will be high-end users. The next largest portion of users has an
average usage requirement and makes up another 40% of the population.
The remaining 20% are extremely light users of the system. Your
expected high-end desktop requirement is 2 vCPUs and 6 GB of memory,
and your low-end user requirement is 1 GB and 1 vCPU. If you break this
out, the planning starts to look like Table 1.
Table 1. User Segmentation
You can use the usage types to further refine
your design to ensure your hardware estimates are accurate. You can
then take the information gathered through either the capacity planning
analysis or PoC and adjust it to factor in these usage types. This step
is necessary because both capacity planning and PoC environments tend
to provide a snapshot of usage versus actual. It is very difficult to
ensure that you have captured data that represents exactly what you
will see in production. There is really no single tool to do this,
so you must combine what you know about the environment and your
metrics to develop your hardware requirements. You can automate a good
portion of this process by using tools such as the ones available from www.liquidwarelabs.com and www.lakesidesoftware.com.
If you are engaging a desktop replacement
strategy where you must be able to justify the costs versus risks
versus benefits, you might need to oversubscribe resources in your
design. Justifying your design based on the cost per desktop and return
on investment is a typical activity when you build a business case for
VDI. With the focus on austerity and general move to reduce overall
costs, it is important that you be able to speak to the cost per
desktop. To get a better price point, you may run the environment at a
higher rate of utilization to get a better price/VM or View desktop.
For example, if you develop a conservative specification of 50%
utilization, the hardware required to scale the environment may be cost
prohibitive. You might need to oversubscribe the underlying physical
resources to ensure the solution is both scalable and cost effective.
When you build scaled-out VDI
environments, it is important to develop your specifications in blocks
or logical groupings of servers, storage, and software. For example,
you should know if you are scaling your solution to 10,000 desktops
that a block of 5000 desktops requires 50 servers (an average of 100
desktops per server), 14 TB of storage,
and 100 licenses of vSphere Enterprise. The reason for this is it makes
your solution much easier to grow if you design your solution to scale
in building blocks that equal a certain number of virtual desktops with
a fixed amount of resources. In this way, your capital costs can remain
consistent during your desktop replacement strategy.