5. Disk Requirements
When calculating disk requirements for some
applications, it is easy to decide that a single 500 GB hard disk will
solve your storage needs. You might be tempted to think the same thing
about Exchange Server.
With earlier versions of Exchange, getting the disk
requirements sized correctly could be a bit tricky. That is not to say
that doing so cannot still be tricky with Exchange Server 2010. This is
because sizing a disk is not just a matter of figuring out how much
storage you need. Physical storage requirements are a big part of the
sizing, of course, because if you don't get large enough disks to
support your users, you will be going back to the boss for more money
to buy more disks.
But, asking the boss to buy more physical disk
drives because the user's mailboxes are full is at least something
tangible you can ask for. The other side of the sizing requirement is
ensuring that the disk I/O capacity will keep up with the database
engine. The more users using the Exchange Server, the greater the disk
I/O capacity required by the disk subsystem. Try explaining to your
boss that the disks have plenty of storage available but can't keep up
with the database load.
The disk subsystem that you choose has to be able to support not only the amount
of storage required but also the I/O load that the users will place on
the disk subsystem. Therefore, understanding the I/O profile as well as
the amount of storage required is important.
5.1. Improved Caching and Reduced I/O Profiles
By and large, Client Access and Hub Transport
servers require far less disk I/O capacity than Mailbox servers, though
Hub Transport servers in very large messaging environments may need
more I/O than most organizations. The information in this section
applies to servers that are hosting the Mailbox server role.
If you are coming from the Exchange Server 2000/2003
world, you already know that even on a server with only a few hundred
mailboxes, Exchange Server 2000/2003 quickly reaches the maximum amount
of RAM available for caching (1.2 GB maximum). As more and more users
vie for the same physical memory for caching, Exchange Server quickly
becomes constrained by the amount of I/O operations that the Exchange
server's disk subsystem can support.
Hundreds of pages of material have been written on
the concept of optimizing Exchange Server for maximizing performance by
improving I/O performance with Exchange — and we certainly can't do the
concept justice in just a few paragraphs—but understanding the basic
input/output per second (IOPS) requirements of users is helpful.
Microsoft and hardware vendors have done much research on I/O
requirements based on the mailbox size and the average load that each
user places on the server.
Remember the user profile table shown previously in Table 1? Well, Table 6
takes that and includes the estimated IOPS given a user type and an
estimated mailbox size for Exchange 2003. We are including this
information because we want you to see the database performance
improvements since Exchange Server 2003. IOPS requirements climb as the
number of messages sent and received increases and as the mailbox size
increases.
Table 6. User Type, Database IOPS, Messages Sent and Received, and Mailbox Size Estimates for Exchange 2003
User Type | Database Volume IOPS | Messages Sent/Received per Day | Mailbox Size |
---|
Light | .5 | 20 sent/50 received | 50 MB |
Average | .75 | 30 sent/75 received | 100 MB |
Heavy | 1.0 | 40 sent/100 received | 200 MB |
Large | 1.5 | 60 sent/150 received | 500 MB |
|
For an Exchange 2003 server that is supporting 3,000
heavy mailbox users, the disk subsystem would have to support at least
3,000 IOPS. A typical SCSI or SAS disk drive supports between 100 and
150 IOPS (depending on the disk drive model). To meet this requirement,
the disk subsystem may require more disks (from an I/O capacity
perspective) than are required from a disk space perspective; thus, the
disk subsystem may have far more disk space than is actually necessary
to support the IOPS profile. Failure to plan for sufficient IOPS
capacity on the disk subsystem will significantly hurt performance.
When Exchange Server 2007 entered the market, the
64-bit architectural improvements allowed the operating system and
Exchange Server 2007 to access more physical memory. With additional
physical memory available for caching, disk I/O is significantly
reduced. Microsoft estimates that I/O requirements are reduced by
approximately 70 percent provided the Exchange 2007 server has the
recommended amount of RAM. Table 7
shows the estimated IOPS requirements for Exchange 2007 Mailbox
servers. Please keep in mind that these are estimates and may change
over time. These numbers are also calculated when the Mailbox server is
configured with more than the recommended amount of RAM.
With this significant improvement in caching
Exchange data, the Extensible Storage Engine (ESE) database engine
needs to read and write from the disk less frequently and thus reduces
the IOPS requirements. When the IOPS requirements are reduced, fewer
disks are required to support the I/O load. Notice an Exchange Server
2007 "heavy" user requires only 0.32 IOPS as opposed to an Exchange
2003 "heavy" user that requires 1.0 IOPS.
The Exchange database team has been hard at work
further improving the I/O performance of Exchange Server 2010 Mailbox.
One of the key factors that the database team focused on with Exchange
Server 2010 is to further improve the I/O performance so that most
types of affordable disk drive can be used (such as SATA, SAS, or
SCSI). They have done this by further optimizing the use of cache
memory, increasing database page sizes, changing the database schema,
and optimizing how the database arranges data to be written to the disk.
The resulting improvements to the Exchange Server
2010 database engine further reduce the I/O requirements for the
standard usage profiles. Table 8 shows the disk I/O recommendations based on usage profiles for Exchange Server 2010. Note that the estimates in Table 8 are based on the release-to-manufacturing version of Exchange Server 2010 and Microsoft may refine these further in the future.
Table 7. User Type, Database Volume IOPS, and Messages Sent and Received Per Day for Exchange 2007
User Type | Database Volume IOPS | Messages Sent/Received per Day |
---|
Light | .11 | 5 sent/20 received |
Average | .18 | 10 sent/40 received |
Heavy | .32 | 20 sent/80 received |
Very Heavy | .48 | 30 sent/120 received |
Extra Heavy | .64 | 40 sent/160 received |
|
Table 8. User Type, Database Volume IOPS, and Messages Sent and Received per Day For Exchange 2010
User Type | Database Volume IOPS | Messages Sent/Received per Day |
---|
Light | .10 | 5 sent/20 received |
Average | .14 | 10 sent/40 received |
Heavy | .20 | 20 sent/80 received |
Large | .29 | 30 sent/120 received |
|
The I/O requirements, of course, are just estimates,
but they generally provide a pretty good guideline for the IOPS
requirements for the disks that will host Exchange databases. The disks
that will host the Exchange transaction logs will require approximately
10 to 20 percent of the IOPS requirements for their corresponding
database.
In many environments with more than a few hundred
mailboxes, the storage subsystem becomes the most expensive part of the
Exchange infrastructure.
Company XYZ had a single Exchange 2003 server that
supported approximately 1,500 users. Using Performance Monitor, they
estimated that the average IOPS requirement was approximately 0.75 IOPS
per second. The disk subsystem that held the database therefore had to
support approximately 1,125 IOPS. To give themselves some room to grow
and to accommodate unusual spurts in activity, the company used an
estimated value of 1,500 IOPS.
Based on the architecture of the physical server
they were using for their Exchange 2003 Mailbox server role, the
company could not achieve this IOPS requirement using direct attached
storage (DAS). Therefore, they had to use a fiber channel storage area
network (SAN) to accommodate the Exchange data. The cost per gigabyte
for the SAN storage was approximately $38.
During their planning for Exchange Server 2010, the
company estimated that the typical user was somewhere between an
average user and a heavy user. They further estimated that the IOPS
requirement for each user would be approximately 0.20 IOPS per user, or
a total of 300 IOPS. This represented a significant drop in the IOPS
requirements from Exchange Server 2003. With their proposed server
architecture, they could accommodate this IOPS requirement with DAS for
a cost of approximately of $5 per gigabyte.
Granted that a SAN can often provide more features
(scalability, snapshots, replication, and so forth) than just raw
storage, but this company had to weigh the costs for those additional
features against their relative value to the company. In this case,
they chose to use DAS instead of the SAN and saved a considerable
amount of money.
|
5.2. Mailbox Storage
Exchange servers holding the Mailbox server role
consume the most disk space. Exchange system designers often fall short
in their designs by not allowing sufficient disk space for mail
storage, transaction logs, and extra disk space. Often the disk space
is not partitioned correctly, either. Here are some important points to
keep in mind when planning your disk space requirements:
Transaction log files should be on a
separate set of physical disks (spindles) from their corresponding
Exchange database files if you are only deploying a single database
copy. RAID 1 or RAID 0+1 arrays provide better performance for
transaction logs.
Allow for at least 7
to 10 days' worth of transaction logs to be stored for each database.
The estimated amount of transaction logs will vary dramatically from
one organization to another, but a good starting point is about 4 GB of
transaction logs per day per 1,000 mailboxes. This is just one estimate
of a specific usage profile, though, and your actual mileage may vary.
Tools like the Exchange Storage Calculator can be used to assist in
disk space requirements.
If you
frequently move mailboxes from one mailbox database to another, take
this into consideration. When a mailbox is moved in Exchange 2010, the
mailbox's dumpster is moved with the mailbox.
Allow
for whitespace estimates in the maximum size of each of your database
files. (The whitespace is the empty space that is found in the database
at any given time.) The size of the whitespace in the database can be
approximated by the amount of mail sent and received by the users with
mailboxes in that database. For example, if you have one hundred 2 GB
mailboxes (a total of 200 GB) in a database where users send and
receive an average of 10 MB of mail per day, the whitespace is
approximately 1 GB (100 mailboxes × 10 MB per mailbox).
Factor
in 5 to 10 percent additional disk space for the content index
databases. You will have one content index database for each production
database.
Allocate enough free space on
the disk so that you can always make a backup copy of your largest
database and still have some free disk space. A good way to calculate
this is to take 110 percent of the largest database you will support
because that also allows you to defragment the database using Eseutil
if necessary.
Consider additional disk
space for message tracking, message transport, RPC Client Access, HTTP
protocol, POP3 protocol, and IMAP4 protocol log files if you have
combined function servers.
Always have recovery in mind and make sure you have enough disk space to be able to restore a database to a recovery database.
Microsoft has a number of excellent guidelines for
estimating disk space requirements and database sizing, including the
Storage Calculator.
Let's move on to an example of a server that will
support 1,000 mailboxes. We are estimating that we will provide the
typical user with a Prohibit Send size warning of 500 MB and a Prohibit
Send And Receive limit of 600 MB. In any organization of 1,000 users,
you have to take into account that 10 percent will qualify as VIPs who
will be allowed more mail storage than a typical user; in this case,
let's allow 100 VIP users to have a Prohibit Send And Receive limit of
2GB.
These calculations result in 540 GB of mail storage
requirements (600 MB × 900 mailboxes) for the first 900 users plus
another 200 GB (2 GB × 100 mailboxes) for the VIP users. This results
in a maximum amount of mail storage of 740 GB. However, this estimate
does not include estimates for deleted items in a user's mailbox and
deleted mailboxes, so we want to add an additional overhead factor of
about 15 percent, or about 111 MB, plus an additional overhead factor
of another 15 percent (another 111 MB) for database whitespace.
So at any given time, for these 1,000 mailboxes we
can expect mail database storage (valid email content, deleted data,
and empty database space) to consume approximately 962 GB, but because
we like round numbers, we'll average that up to 1,000 GB, or 1TB.
In this example, let's say that we have decided the
maximum database size we want to be able to back up or restore is 100
GB. This means that we need to split the users' mailboxes across 10
mailbox databases.
For the transaction logs, we estimate that we will
generate approximately 5 GB of transaction logs per day. We should plan
for enough disk space on the transaction log disk for at least 50 GB of
available disk space.
Next, because full-text indexing is enabled by
default, we should allow enough disk space for the full-text index
files. In this case, we will estimate that the full-text index files
will consume a maximum of about 10 percent of the total size of the
mail data, or approximately 100 GB. If we combine the full-text index
files on the same disk drive as the database files, we will need about
1.3 TB of disk space.
Anytime you are not sure how much disk space you
should include, it is a good idea to plan for more rather than less.
Although disk space is reasonably inexpensive, unless you have
sophisticated storage systems, adding additional disk space can be time
consuming and costly from the perspective of effort and downtime.
5.3. Planning for Mail Growth
Growth? You may be saying to yourself, "I just gave
the typical user a maximum mailbox size of 600 MB and the VIPs a
maximum size of 2 GB! How can my users possibly need more mailbox
space?" Predicting the amount of growth you may need in the future is a
difficult task. You may not be able to foresee new organizational
requirements or that you might be influenced by future laws that
require specific data retention periods.
In our experience, though, mailbox limits,
regardless of how rigid you plan to be, are managed by exception and by
need. In the preceding example, we calculated that we would need 1.3 TB
of disk space for our 1,000 mailboxes. Would we partition or create a
disk of exactly that size? Probably not.
Instead of carving out exactly the amount of disk
space you anticipate needing, add a "fluff factor" to your
calculations. We recommend adding approximately 20 to 25 percent
additional capacity to the anticipated amount of storage you think you
will require, but this is just a wild guess. In this example, though,
we might anticipate using 1.3 TB of disk space if we added 25 percent
to our expected requirements. Here are some factors that you may want
to consider when deciding how much growth you should expect for your
mailbox servers:
Average annual growth in the number of employees
Acquisitions, mergers, or consolidations that are planned for the foreseeable future
Addition of new mail-enabled applications such as Unified Messaging features or electronic forms routing
Government regulations that require some types of corporate records (including email) to be retained for a number of years
Conversely, potential events in your future could
reduce the amount of mailbox storage you require. Many organizations
are now including message archival and long-term retention systems in
their messaging systems. These systems archive older content from a
user's mailbox and move it to some type of external storage such as
disk, storage area network (SAN), network-attached storage (NAS),
optical, or tape storage.
5.4. Email Archiving and Mail Storage
Email has emerged as the predominant form of
business communications. Sales, marketing, ordering, human resources,
legal, financial and all other types of information are now
disseminated via email.
An emerging trend in the email business is email
archiving. As of this writing, more than 60 companies provide archiving
solutions for email systems. Some of these companies provide in-house
solutions whereas some are hosted solutions. There are just about as
many reasons to implement an email archive system as there are archive
vendors. Some of the reasons to implement email archiving include:
Reduces the size of mailbox databases and
mailboxes (smaller databases and smaller mailboxes improve disaster
recovery response times and improve performance)
Provides long-term retention of email data
Provides users with a searchable index of their historical email data
Allows
for eDiscovery of email (message content, attachments, as well as email
metadata) that often must be indexed for legal proceedings
Eliminates the use of Outlook personal folder (PST) files
Third-party archive systems are great for
organizations that must retain much of the information in their
mailboxes but want to move it to external storage. However, depending
on the system, you don't want to archive everything older than five
days, because that may prevent the user from accessing it via Outlook
Web Access or mobile devices. Further, once the content is archived and
no longer residing in the user's mailbox, it will no longer be
accessible from a user's desktop search engine, such as the Google
Desktop or the Windows Desktop search engine. So keeping a certain
amount of content in the user's mailbox will always make sense.
Microsoft has introduced an email archive system for
Exchange Server 2010. Microsoft's approach in this version is to
establish an extra archive mailbox for each user who requires
archiving. The email archive mailbox must reside on the same mailbox
database as the user's mailbox. This approach does serve the goal of
reducing the size of the user's primary mailbox, but it does not reduce
the size of the database.
If you are planning to use the Exchange
Server 2010 mailbox archive feature, you will need to take this into
account and plan for additional storage as needed.