Many administrators and IT professionals immediately think of storage designs when they hear the word availability.
While storage is a critical part of ensuring the overall service
availability of an Exchange organization, the impact of storage design
is far more than just availability; it directly affects performance,
reliability, and scalability.
1. An Overview of Exchange Storage
In medium-sized and large organizations, the
Exchange administrator is usually not also responsible for storage.
Many medium-sized and large organizations use specialized storage area
networks (SANs) that require additional training to master. Storage is
a massive topic, but we feel it is important that you at least be able
to speak the language of storage.
From the very beginning, messaging systems have had
a give-and-take relationship with the underlying storage system. Even
on systems that aren't designed to offer long-term storage for email
(such as ISP systems that offer only POP3 access), email creates
demands on storage:
The transport (MTA) components must have space to queue messages that cannot be immediately transmitted to the remote system.
The MDA component must be able to store incoming messages that have been delivered to a mailbox until users can retrieve them.
The message store, in systems like Exchange, permits users to keep a copy of their mailbox data on central servers.
As
the server accepts, transmits, and processes email, it keeps logs with
varying levels of detail so administrators can troubleshoot and audit
activities.
Direct attached storage is the most common type of
storage in general. DAS disks are usually internal disks or directly
attached via cable. Just about every server, except for some high-end
varieties such as blade systems running on boot-over-SAN, uses DAS at
some level; typically, at least the boot and operating system volumes
are on some DAS configuration. DAS, however, has drawbacks for use with
Exchange storage: it doesn't necessarily scale as well for either
capacity or performance. Further organizations that have invested
significant amounts of money in their SANs may still require that
Exchange use the SAN instead of DAS.
To solve these problems, people looked at NAS
devices as one of the potential solutions. These machines — giant file
servers — sit on the network and share their disk storage. They range
in price and configuration from small plug-in devices with fixed
capacity to large installations with more configuration options than
most luxury cars (and a price tag to match). Companies that bought
these were using them to replace file servers, web server storage, SQL
Server storage — why not Exchange?
For many years, Exchange Server wasn't compatible
with NAS devices; Microsoft didn't support moving Exchange storage to
NAS, and vociferously argued against the idea. But ultimately Microsoft
supported NAS devices for Exchange 2003.
Apparently, despite all the people asking for NAS
support in Exchange 2003, it didn't turn out to be a popular option,
because NAS devices were no longer supported for Exchange Server 2007
and beyond. Instead, the push switched to reducing the overall I/O
requirements so that DAS configurations become practical for small to
midsized organizations. Exchange 2007 moved to a 64-bit architecture to
remove memory management bottlenecks in the 32-bit Windows kernel,
allowing the Exchange Information Store to use more memory for
intelligent mailbox data caching and reduce disk I/O. Exchange 2010 in
turn makes aggressive changes to the on-disk mailbox database
structures, such as moving to a new database schema that allows pages
to be sequentially written to the end of the database file rather than
randomly throughout the file. The schema updates improve indexing and
client performance, allowing common tasks such as updating folder views
to happen more quickly while requiring fewer disk reads and writes.
These changes help improve efficiency and continue to drive mailbox I/O
down.
The premise behind SAN is to move disks to dedicated
storage units that can handle all the advanced features you need —
high-end RAID configurations, hot-swap replacement, on-the-fly
reconfiguration, rapid disk snapshots, tight integration with backup
and restore solutions, and more. This helps consolidate the overhead of
managing storage, often spread out on dozens of servers and
applications (and their associated staff), into a single set of
personnel. Then, dedicated network links connect these storage silos
with the appropriate application servers. Yet this consolidation of
storage can also be a serious pitfall since Exchange is usually not the
only application placed on the SAN. Applications such as SharePoint,
SQL, archiving, and file services may all be sharing the same
aggregated set of spindles and cause disk contention.
2. Direct Attached Storage
When early versions of Exchange Server came on the
market, DAS was just the way you did things. As used for legacy
Exchange storage, DAS historically displays two main problems:
performance and capacity. As mailbox databases got larger and traffic
levels rose, pretty soon people wanted to look for alternatives; DAS
storage under Exchange 2000 and Exchange 2003 required a lot of disks,
because Exchange's I/O profile was optimized only for the 32-bit
architecture that Windows provided at the time. Quite simply, with a
fixed amount of RAM available for caching, the more simultaneous users
there were on an Exchange 2003 server, the less cache per user was
available.
To get more scalability on logical disks that
support Exchange databases, you can always try adding more disks to the
server. This gives you a configuration known as Just a Bunch of Disks
(JBOD).
Although JBOD can usually give you the raw disk
storage capacity you need, it has three flaws that render it unsuitable
for all but the smallest legacy Exchange deployments:
JBOD forces you to partition your data
Because each disk has a finite capacity, you
can't store data on that disk if it is larger than the capacity. For
example, if you have four 250 GB drives, even though you have
approximately one terabyte of storage in total, you have to break that
up into separate 250 GB partitions. Historically, this has caused some
interesting design decisions in messaging systems that rely on file
system–based storage.
JBOD offers no performance benefits
Each disk is responsible for only one chunk of
storage, so if that disk is already in use, subsequent I/O requests
will have to wait for it to free up before they can go through. A
single disk can thus become a bottleneck for the system, which can slow
down mail for all your users (not just those whose mailboxes are stored
on the affected disk).
JBOD offers no redundancy
If one of your disks dies, you're out of luck
unless you can restore that data from backup. True, you haven't lost
all your data, but the one-quarter of your users who have just lost
their email are not likely to be comforted by that observation.
Several of the Exchange 2010 design goals have
focused on building in the necessary features to work around these
issues and make a DAS JBOD deployment a realistic option for more
organizations. However, legacy versions of Exchange contain no
mechanisms to work around these issues. Luckily, some bright people
came up with a great generic answer to JBOD that also works well for
legacy Exchange: the Redundant Array of Inexpensive Disks (RAID).
The basic premise behind RAID is to group the JBOD
disks together in various configu-rations with a dedicated disk
controller to handle the specific disk operations, allowing the
computer (and applications) to see the entire collection of drives and
controller as one very large disk device. These collections of disks
are known as arrays; the arrays are presented to the operating system,
partitioned, and formatted as if they were just regular disks. The
common types of RAID configurations are shown in Table 1.
Table 1. RAID Configurations
Raid Level | Name | Description |
---|
None | Concatenated drives | Two
or more disks are joined together in a contiguous data space. As one
disk in the array is filled up, the data is carried over to the next
disk. Though this solves the capacity problem and is easy to implement,
it offers no performance or redundancy whatsoever, and makes it more
likely that you're going to lose all your data, not less, through a
single disk failure. These arrays are not suitable for use with legacy
Exchange servers. |
RAID 0 | Striped drives | Two
or more disks have data split among them evenly. If you write a 1 MB
file to a two-disk RAID 0 array, half the data will be on one disk,
half on the other. Each disk in the array can be written to (or read
from) simultaneously, giving you a noticeable performance boost.
However, if you lose one disk in the array, you lose all your data.
These arrays are typically used for fast, large, temporary files, such
as those in video editing. These arrays are not suitable for use with
Exchange; while they give excellent performance, the risk of data loss
is typically unacceptable. |
RAID 1 | Mirrored drives | Typically
done with two disks (although some vendors allow more), each disk
receives a copy of all the data in the array. If you lose one disk,
you've still got a copy of your data on the remaining disk; you can
either move the data or plug in a replacement disk and rebuild the
mirror. RAID 1 also gives a performance benefit; reads can be performed
by either disk, because only writes need to be mirrored. However, RAID
1 can be one of the more costly configurations; to store 500 GB of
data, you'd need to buy two 500 GB drives. These arrays are suitable
for use with legacy Exchange volumes, depending on the type of data and
the performance of the array. |
RAID 5 | Parity drive | Three
or more disks have data split among them. However, one disk's worth of
capacity is reserved for parity checksum data; this is a special
calculated value that allows the RAID system to rebuild the missing
data if one drive in the array fails. The parity data is spread across
all the disks in the array. If you had a four-disk 250 GB RAID 5 array,
you'd only have 750GB of usable space. RAID 5 arrays offer better
performance than JBOD, but worse performance than other RAID
configurations, especially on the write requests; the checksum must be
calculated and the data + parity written to all the disks in the array.
Also, if you lose one disk, the array goes into degraded mode, which
means that even read operations will need to be recalculated and will
be slower than normal. These arrays are suitable for use with legacy
Exchange mailbox database volumes on smaller servers, depending on the
type of data and the performance of the array. Due to their write
performance characteristics, they are usually not well matched for
transaction log volumes. |
RAID 6 | Double parity drive | This
RAID variant has become common only recently, and is designed to
provide RAID 5 arrays with the ability to survive the loss of two
disks. Other than offering two-disk resiliency, base RAID 6
implementations offer mostly the same benefits and drawbacks as RAID 5.
Some vendors have built custom implementations that attempt to solve
the performance issues. These arrays are suitable for use with
Exchange, depending on the type of data and the performance of the
array. |
RAID 10
RAID 0+1
RAID 1+0 | Mirroring plus striping | A
RAID 10 array is the most costly variant to implement because it uses
mirroring. However, it also uses striping to aggregate spindles and
deliver blistering performance, which makes it a great choice for
high-end arrays that have to sustain a high-level of I/O. As a side
bonus, it also increases your chances of surviving the loss of multiple
disks in the array. There are two basic variants. RAID 0+1 takes two
big stripe arrays and mirrors them together; RAID 1+0 takes a number of
mirror pairs and stripes them together. Both variants have essentially
the same performance numbers, but 1+0 is preferred because it can be
rebuilt more quickly (you only have to regenerate a single disk) and
has far higher chances of surviving the loss of multiple disks (you can
lose one disk in each mirror pair). These arrays have traditionally
been used for high-end highly loaded legacy Exchange mailbox database
volumes. |
Note that several of these types of RAID arrays may
be suitable for your Exchange server. Which one should you use? The
answer to that question depends entirely on how many mailboxes your
servers are holding, how they're used, and other types of business
needs. Beware of anyone who tries to give hard-and-fast answers like,
"Always use RAID 5 for Exchange database volumes." To determine the
true answer, you need to go through a proper storage sizing process,
find out what your I/O and capacity requirements are really going to
be, think about your data recovery needs and service level agreements
(SLAs), and then decide what storage configuration will meet those
needs for you in a fashion you can afford. There are no magic bullets.
In every case, the RAID controller you use — the
piece of hardware, plus drivers, that aggregates the individual disk
volumes for you into a single pseudo-device that is presented to
Windows — plays a key role. You can't just take a collection of disks,
toss them into slots in your server, and go to town with RAID. You need
to install extra drivers and management software, you need to take
extra steps to configure your arrays before you can even use them in
Windows, and you may even need to update your disaster recovery
procedures to ensure that you can always recover data from drives in a
RAID array. Generally, you'll need to test whether you can move drives
in one array between two controllers, even those from the same
manufacturer; not all controllers support all options. After your
server has melted down and your SLA is fast approaching is not a good
time to find out that you needed to carry a spare controller on hand.
If you choose the DAS route (whether JBOD or RAID),
you'll need to think about how you're going to house the physical
disks. Modern server cases don't leave a lot of extra room for disks;
this is especially true of rack-mounted systems. Usually, this means
you'll need some sort of external enclosure that hooks back into a
physical bus on your server, such as SAS or eSATA disks. Make sure to
give these enclosures suitable power and cooling; hard drives pull a
lot of power and return it all eventually as heat.
Also make sure that your drive backplanes (the
physical connection point) and enclosures support hot-swap capability,
where you can easily pull the drive and replace it without powering the
system down. Keep a couple of spare drives and drive sleds on hand,
too. You don't want to have to schedule an outage of your Exchange
server in order to replace a failed drive in a RAID 5 array, letting
all your users enjoy the performance hit of a thrashing RAID volume
because the array is in degraded mode until the replacement drives
arrive.
Beware! Not all kinds of RAID are created equal.
Before you spend a lot of time trying to figure out which configuration
to choose, first think about your RAID controller. There are three
kinds of them, and unlike RAID configurations, it's pretty easy to
determine which kind you need for Exchange:
Software RAID Software RAID avoids the whole problem of having
a RAID controller by performing all the magic in the operating system
software. If you convert your disk to dynamic volumes, you can do RAID
0, RAID 1, and RAID 5 natively in Windows 2008 without any extra
hardware. However, Microsoft strongly recommends that you not do this
with Exchange, and the Exchange community echoes that recommendation.
It takes extra memory and processing power, and inevitably slows your
disks down from what you could get with a simple investment in good
hardware. You will also not be able to support higher levels of I/O
load with this configuration, in our experience.
BIOS RAID BIOS RAID attempts to provide "cheap" RAID by
putting some code for RAID in the RAID chipset, which is then placed
either directly on the motherboard (common in workstation-grade and
low-end server configurations) or on an inexpensive add-in card. The
dirtylittlesecretisthatthe RAID chipsetisn't really doing the RAID
operations in hardware; again it's all happening in software, this time
in the associated Windows driver (which is written by the vendor)
rather than an official Windows subsystem. If you're about to purchase
a RAID controller card for a price that seems too good to be true, it's
probably one of these cards. These RAID controllers tend to have a
smaller number of ports, which limits their overall utility. Although
you can get Exchange to work with them, you can do so only with very
low numbers of users. Otherwise, you'll quickly hit the limits these
cards have and stress your storage system. Just avoid them; the time
you save will more than make up for the up-front price savings.
Hardware RAID This is the only kind of RAID you should even be
thinking about for your Exchange servers. This means good-quality,
high-end cards that come from reputable manufacturers that have taken
the time to get the product on the Windows Hardware Compatibility List
(HCL). These cards do a lot of the work for your system, removing the
CPU overhead of parity calculations from the main processors, and they
are worth every penny you pay for them. Better yet, they'll be able to
handle the load your Exchange servers and users throw at them.
If you can't tell whether a given controller you're
eyeing is BIOS or true hardware RAID, get help. Lots of forums and
websites on the Internet will help you sort out which hardware to get
and which to avoid. And while you're at it, spring a few extra bucks
for good, reliable disks. We cannot stress enough the importance of not
cutting corners on your Exchange storage system; while Exchange 2010
gives you a lot more room for designing storage and brings back options
you may not have had before, you still need to buy the best components
that you can to make up the designed storage system. The time and
long-term costs you save will be your own.
|
3. Storage Area Networks
Initial SAN solutions used fiber-optic connections
to provide the necessary bandwidth for storage operations. As a result,
these systems were incredibly expensive and were used only by
organizations with deep pockets. The advent of Gigabit Ethernet over
copper and new storage bus technologies such as SATA and SAS, however,
has moved the cost of SANs down into the realm where midsized companies
can now afford both the sticker price and the resource training to
become competent with these new technologies.
Over time, many vendors have begun to offer SAN
solutions that are affordable even for small companies. The main reason
they've been able to do so is the iSCSI protocol; block-based file
access routed over TCP/IP connections. Add iSCSI with ubiquitous
Gigabit Ethernet hardware, and SAN deployments have become a lot more
common.
Clustering and high availability concerns are the
other factors in the growth of Exchange/SAN deployments. Exchange 2003
supported clustered configurations but required the cluster nodes to
have a shared storage solution. As a result, any organization that
wanted to deploy an Exchange cluster needed some sort of SAN solution
(apart from the handful of people who stuck with shared SCSI
configurations). A SAN has a certain elegance to it; you simply create
a virtual slice of drive space for Exchange (called a LUN, or logical
unit number), use Fibre Channel or iSCSI (and corresponding drivers) to
present it to the Exchange server, and away you go. Even with Exchange
2007 — which was reengineered with an eye toward making DAS a
supportable choice for Exchange storage in specific CCR and SCR
configurations — many organizations still found that using SAN for
Exchange storage was the best answer for their various business
requirements. By this time, management had seen the benefits of
centralized storage management and wanted to ensure that Exchange
deployments were part of the big plan.
However, SAN solutions don't fix all problems, even
with (usually because of) their price tag. Often, SANs make your
environment even more complex and difficult to support. Because SANs
cost so much, there is often a strong drive to use the SAN for all
storage and make full use of every last free block of space. The cost
per GB of storage for a SAN can be between three and ten times as
expensive as DAS disks. Unfortunately, Exchange's I/O characteristics
are very different than those of just about any other application, and
few dedicated SAN administrators really know how to properly allocate
disk space for Exchange:
SAN administrators do not usually understand
that total disk space is only one component of Exchange performance.
For day-to-day operations, it is far more important to ensure enough
I/O capacity. Traditionally, this is delivered by using lots of
physical disks (commonly referred to as "spindles") to increase the
amount of simultaneous read/write operations supported. It is important
to make sure the SAN solution provides enough I/O capacity, not just
free disk space, or Exchange will crawl.
Even
if you can convince them to configure LUNs spread across enough disks,
SAN administrators immediately want to reclaim that wasted space. As a
result, you end up sharing the same spindles between Exchange and some
other application with its own performance curve, and then suddenly you
have extremely noticeable but hard-to-diagnose performance issues with
your Exchange servers. Shared spindles will kill Exchange.
Although
some SAN vendors have put a lot of time and effort into understanding
Exchange and its I/O needs so that their salespeople and certified
consultants can help you deploy Exchange on their products properly,
not everyone does the same. Many vendors will shrug off performance
concerns by telling you about their extensive write caching and how
good write caching will smooth out any performance issues. Their
argument is true ... up to a point. A cache can help isolate Exchange
from effects of transient I/O events, but it won't help you come Monday
morning when all your users are logging in and the SQL Server databases
that share your spindles are churning through extra operations.
The moral of the story is simple: don't believe that
you need to have a SAN. This is especially true with Exchange 2010;
there have been a lot of under-the-hood changes to the mailbox database
storage to ensure that more companies can deploy a 7200 RPM SATA JBOD
configu-ration and be able to get good performance and reliability from
that system.
If you do find that a SAN provides the best
value for your organization, get the best one you can afford. Make sure
that your vendors know Exchange storage inside and out; if possible,
get them to put you in contact with their on-staff Exchange
specialists. Have them work with your SAN administrators to come up
with a storage configuration that meets your real Exchange needs.