Exchange Server 2010 : Mailbox Storage - Determining the Number of Databases, Allocating Disk Drives

12/25/2013 1:41:45 AM

When estimating mailbox database size for a given configuration, as a worst-case scenario we once estimated that a single database could grow to 1.3 TB in size. Although Exchange Server can technically support a database that large, it would take forever to back up, and worse, it would take forever to restore. (Okay, maybe not "forever," but longer than what would make operational sense.) Even if you are using snapshot technologies, if the snapshot backup software performs database verification, the verification would take far too long. So a database size of 1.3 TB is just not practical in organizations that have not yet implemented a DAG with continuous replication.

1. Maximum Database Sizes

Microsoft recommends that you keep each mailbox database under about 200 GB if you are not using any type of replication technology. If you are using a DAG and maintaining at least two copies of each database, you can consider allowing a maximum database size of 2 TB.

These numbers are based on some simple principles. Consider that if you don't use replication for your mailbox databases, you have to account for the time necessary to restore a database and the impact of a restore operation. A smaller database, in the case of loss or hardware failure, can be restored quickly, ensuring minimal impact on users. When database replication is put in place, the replica of the database in essence acts as a backup and, depending on the number of mailbox database copies, may never be used in a restore operation. In that case, a large database is more efficient, as it simplifies administration by reducing the number of databases in the organization.

We urge you to consider your existing environment when you think about these maximum sizes. Ultimately, you need to consider how much time it will take to restore one of these databases from a tape backup; if the absolute longest time you can take to restore a database from your backup media (for example, a tape) is two hours, and your tape system restores at a rate of 30 GB per hour, then the largest database size you should consider supporting is 60 GB. A company's Recovery Time Objective/Recovery Point Objective (RTO/RPO) will most likely dictate recovery time and therefore will help you in calculating what your maximum database sizes should be.

Replication technologies in Exchange Server 2010 provide options for quicker access to a mailbox database, in the event of a server or disk failure. Naturally, this requires a proper implementation and configuration of a DAG.

2. Determining the Number of Databases

A common way to improve the scalability of Mailbox servers is to add mailbox databases. Though this might not improve overall server performance or a user's perceived response time, it allows you to break up the amount of data you are storing and place it across multiple smaller mailbox databases. In turn, this enables you to support larger mailboxes. Keep in mind as you increase the number of mailboxes that each Mailbox server supports, increasing the amount of RAM will help improve performance and reduce the disk I/O profile.

Some administrators may want to create multiple mailbox databases to gain underlying performance benefits. Each mailbox database is configured with a 20 MB checkpoint depth. This means that 20 MB of outstanding transactions can be written to the logs but not immediately committed to the database. If you have one mailbox database, then that database's default checkpoint depth is 20 MB; for databases that are replicated, the default checkpoint depth is 100 MB. Note that in previous versions of Exchange Server, the recommendation was to create multiple storage groups that each had a single mailbox store, rather than a single storage group with multiple mailbox stores. This recommendation was in place to ensure that the checkpoint depth, which was unique to the log stream of the storage group, would not have to be shared by the multiple mailbox stores in the storage group. Instead, you were urged to have only a single mailbox store per storage group. In Exchange Server 2010, this issue no longer has relevance, since all mailbox databases maintain their own checkpoint depth and log stream.

When creating additional mailbox databases that do not use database replication with a DAG, you should plan to place each database's transaction logs on separate disk spindles from the database files. This can help improve performance (due to the nature of the I/O differences), though it mainly improves recoverability. If you are using a DAG and have two copies or more, you can safely place the transaction logs and the database files on the same spindles/disks.

Planning for Mailbox Databases

A company named ABC is planning to migrate their existing messaging infrastructure to Exchange Server 2010. ABC has 1,200 users who connect to a server farm in the company's main office. During their planning process, administrators are attempting to determine the number of databases that will support their requirements.

They have identified the following requirements:

Minimize the time necessary to perform a restore in the event of a single disk failure.
Minimize the time necessary to perform an offline operation on the database files.
Provide all users with at least 1 GB of storage, but support even much larger mailboxes.

When looking at each requirement, ABC has determined that they should design the following storage solution:

Create Multiple Mailbox Databases: By having multiple mailbox databases, ABC feels that they will be able to split up the 1,200 users in the multiple mailbox databases and therefore keep the database files to a smaller size. With smaller database files, database restore and offline database operation times are minimized.
Configure Mailbox Size Limits: To ensure that a user or a group of users do not overrun the amount of disk space used, ABC has decided to implement mailbox size limits on the mailbox databases. Hard disk drives have been purchased to support up to 5 GB of storage for each user. For now, administrators plan to configure users to receive a warning message when their storage reaches 4 GB.

Though a single Mailbox server can support the company's users, ABC has also determined that they should plan for mailbox resiliency by using a DAG and database replication across multiple Mailbox servers.

Note that this scenario did not take into consideration the performance requirement of the mailbox databases. You must also analyze the backup/restore needs, service level agreements, and user profiles, and then recommend a storage configuration that will meet the I/O and performance requirements.

3. Allocating Disk Drives

The traditional logic for Exchange Server design was to place databases on a set of physical disk drives separate from the transaction log files. As Exchange 2000/2003 servers scaled upward to support thousands of mailboxes, administrators placed the transaction log files for each storage group on separate spindles (or physical disks) and placed the database files for each group on a different set of spindles.

Although placing different files on separate disks is pretty good advice, today many of us use Fiber Channel or iSCSI SANs to store our Exchange data. The SAN is usually some aggregation of a large number of disks in a RAID 5, RAID 1+0, or other redundant configuration. The person who manages the SAN (hereafter known as one of the SAN people) carves up the amount of storage you request from that large aggregation of disk space and assigns it to you as a logical unit (LUN) of disk space. You then configure your Windows server to connect to those LUNs across the iSCSI or Fiber Channel network (or fabric).

We were skeptical at first of putting Exchange databases on a networked storage device, but we have come to see the advantages for many medium and large organizations. The ability to combine large numbers of disks together into very large volumes and then allocate pieces of that large volume to the applications (such as Exchange) that need disk space can help reduce your storage costs and allow you take advantage of technologies such as snapshot backups and improved recoverability features. Further, because some of the storage is not physically connected to the server, a disaster that befalls the server hardware may not affect the storage system.

If you are a SAN user, you should ask your SAN people for two LUNs for each mailbox database. One LUN should be sized to hold a mailbox database's transaction log files and the other should be sized to hold that database file—that is, of course, for a Mailbox server role, and does not account for the backup requirements. By putting one database on each LUN and one transaction log on each LUN, you ensure that the granularity of snapshot solutions is per database. Dedicating LUNs to specific tasks helps you isolate I/O for those tasks; you should avoid placing the data for other applications on those LUNs that would affect I/O.

When allocating disk drives, we need to look at both capacity requirements and meeting performance needs, based on requirements. When evaluating those performance needs, you must look at the worst-case scenario Exchange environment. That typically means looking at the peak usage periods and maximum user load. We discussed earlier that users can be categorized based on their profile, defined by the number of messages they send and receive. However, you may also want to look at other factors that can impact disk performance and overall server load, such as posts to public folders, third-party archival, BlackBerry server interaction, and other factors. When analyzing your disk I/O capacity and reviewing your I/O requirements, you can arrive at a disk solution that will support your existing environment, as well as allow for growth.

A lot of what has been done in Exchange Server 2010 has been to optimize storage for lower-cost disk solutions. A storage configuration that has no built-in redundancy (RAID-less) and mid-range SATA disks is a reality. Microsoft talks about JBOD (Just a Bunch of Disks, a pretty self-explanatory terminology) configurations, providing a solution where storage capacity can dramatically increase, while keeping storage costs very low. (A caveat in this design is that it depends entirely on a high availability solution that uses a DAG for database replication.)

For heavily used Hub Transport server roles, you might also want to put the Hub Transport server database and log files on a SAN or separate disk storage; the transport database and the log files should each go on their own LUN.

Those of you who think about disks and disk performance may be wondering about all of those LUNs being carved out of the same logical disk. If your SAN is improperly sized and does not have enough spindles, performance can be a problem. A properly engineered SAN solution should provide enough total I/O capacity for all the LUNs and the applications that will use those LUNs to function correctly.

Others

- Exchange Server 2010 : Getting to Know Exchange Database Storage (part 2)

- Exchange Server 2010 : Getting to Know Exchange Database Storage (part 1)

- Understanding SharePoint 2013 authentication (part 3) - Understanding app authentication flow in SharePoint 2013

- Understanding SharePoint 2013 authentication (part 2) - Understanding how SharePoint 2013 authenticates apps

- Understanding SharePoint 2013 authentication (part 1) - Understanding user authentication in SharePoint 2013

- Sharepoint 2013 : Automating tasks with workflows - Importing Visio workflows into SharePoint Designer

- System Center Configuration Manager 2007 : Client Management - The ConfigMgr Client Agent

- System Center Configuration Manager 2007 : Client Troubleshooting (part 2) - ConfigMgr Toolkit

- System Center Configuration Manager 2007 : Client Troubleshooting (part 1) - General Scenarios, Online Assistance, Conflicting Hardware IDs

- Using the Windows PowerShell in an Exchange Server 2007 Environment : Using EMS to Do Reporting (part 2)