Exchange Server 2013 : Building an Available Messaging System (part 2) - Exchange Hybrid Deployment, Database Availability Group Planning

11/24/2013 7:58:17 PM

3. Exchange Hybrid Deployment

Exchange Online presents its own set of SLAs, and it is of interest to us in terms of its interactions with on-premises Exchange. Assuming that your organization is running in Hybrid mode, there will be three on-premises points of interaction with Exchange Online. Specifically, these interaction points are as follows:

Exchange CAS servers
Directory synchronization
Active Directory Federation Services 2.x

Each of these is not highly available by default because each is deployed on a single server. The only possible exception for keeping a single server may be directory synchronization, since it is built as a no-touch software appliance by default, unless deployed using the full-featured Forefront Identity Manager using a highly available SQL instance.

Exchange Client Access Servers providing Exchange Hybrid mode integration may be a subset of the total number of Client Access Servers contained in your organization. If more than one of these exists, they will be load-balanced via some sort of load-balancing mechanism. Client Access Servers facilitating Exchange Hybrid mode are responsible for the interaction between Exchange Online and on-premises Exchange and directly facilitate the features required, which makes the on-premises system and Office 365 appear as a single organization. With this in mind, you will do well to ensure that sufficient redundancy exists to guarantee availability during a server outage, as well as during periods of high server load.

Active Directory Federation Services (AFDS) enable external authentication to an on-premises Active Directory by validating credentials against Active Directory and returning a token that is consumed by Office 365, thereby facilitating one set of Active Directory credentials to be used against both on-premises services as well as Office 365. ADFS servers may have a DMZ-based component (ADFS Proxy servers) alongside the LAN-based ADFS server. ADFS Proxy servers are a version of ADFS that is specifically designed to be deployed in the DMZ, a secure network location, disconnected from a production network via additional layers of firewalls. Since all that these Proxy servers do is to intercept credentials securely and pass them onto LAN-based ADFS instances, they may not be required if an equivalent service is available via Microsoft TMG/UAG or similar.

These types of servers are great virtualization targets because of their light load. Depending on load, you will require a minimum of two ADFS servers and two ADFS Proxy servers.

Your availability concerns for Exchange Online/Hybrid mode include the following:

Internet connectivity.
Sufficient ADFS servers.
Sufficient ADFS Proxy servers (if required).
Networking (Reverse Proxy, firewall).
Load balancer.
Validity of certificates: A server certificate issued by a third-party certificate provider, which should not be expired. The certificate may be a SAN certificate or a wildcard certificate.

4. Database Availability Group Planning

Database availability group (DAG) planning requires you to balance a number of factors. Most of these are interdependent and require significant thought and planning.

DATABASE SIZING

The theoretical maximum database size should not be based purely on the maximum database size supported by Exchange 2013. Large databases require longer backup/restore and reseed times, especially when over the 1 TB mark. Databases size of 1 TB and upward are impractical to back up, and they should only be considered if enough database copies exist in order not to require a traditional backup, specifically three or more copies. You need to strike a balance between fewer nodes and larger databases versus more nodes and smaller databases.

DATABASE COPIES

The number of database copies required in order to meet availability targets is a relatively simple determination. Early on, we discussed the number of disks or databases required in order to calculate a specific availability. If we have been given a stated availability target of 99.99 percent, then we will not be able to achieve such a target with a single database copy. Four copies within a datacenter is the minimum number required for a 99.99 percent availability target. Taking into account the number of databases is just one of the factors in our availability calculation.

In multi-datacenter scenarios, datacenter activation is a manual step, as opposed to the automatic failover provided by high availability. Therefore, switchover requires more time and incurs more downtime that an automatic failover. While Exchange 2013 is able to automate a switchover event, we would argue that the business via the administrator initiating the event should wield that level of control, so that the state of Exchange is always known and understood.

When the second datacenter uses RAID to protect volumes on a single server, as opposed to individual servers with isolated storage, this slightly increases the availability of each individual volume and therefore slightly increases overall availability. In the case of three or more database copies, however, the additional gain will hardly justify the additional costs of doubling the disk spindles (depending on the RAID model) and the additional RAID controllers. Applying the principle of failure domains, it may be cheaper to deploy extra servers with isolated storage, as opposed to deploying the extra disks and RAID controller per RAID volume required to achieve higher availability.

DATABASE AVAILABILITY GROUP NODES

The number of DAG nodes is driven not only by the number of copies required but also by how many nodes are required in a database availability group in order to maintain quorum. Quorum is the number of votes required to establish if the cluster has enough votes to stay up or to make a voting decision, such as mounting databases. Quorum is calculated as the number of nodes/2 + 1. A three-node cluster can therefore suffer a single failure and still maintain quorum. Odd-numbered node sets easily maintain this mathematical relationship; however, even-numbered node sets require the addition of a file share witness.

FILE SHARE WITNESS LOCATION

The file share witness is an empty file share on a nominated server that acts as an extra vote to establish cluster quorum. Whichever datacenter in which the file share witness is located may be considered the primary datacenter. In Exchange 2013, the file share witness may be located in a third datacenter from the primary and secondary location, thereby eliminating the risk of split brain, which is a condition that occurs when two datacenters become active for the same database copy. Changes are now written to different instances of the same database, which requires considerable effort to undo should the WAN link between the primary and secondary datacentre break. This change, while not recommended, is now supported in Exchange 2013, and it is the first version of Exchange to support the separation of the file share witness into a third datacenter.

DATABASE DISTRIBUTION

The distribution of databases on database availability group nodes has a direct impact on performance and availability. In order to demonstrate this concept, consider a four-node DAG with four database copies and with all databases active on Server 1, as shown in Figure 4.

FIGURE 4 Uneven database distribution

images

Server 1 will serve all of the required client interactions, while Servers 2, 3, and 4 remain idle, with the exception of logging replay activity. Assuming Server 1 fails, all active copies fail with the server and, depending on the health of the remaining copies, may all activate on Server 2. This is a highly inefficient distribution structure.

Figure 5 shows how databases are distributed in a manner such that client and server load is balanced and failure domains are minimized (assuming the storage is not shared). Note that this symmetry is precalculated on a current version of the Exchange calculator.

FIGURE 5 Balanced database distribution

images

DETERMINING QUORUM AND DAC

If you have DAC mode enabled on your DAG and a WAN failure occurs, both datacenters will dismount databases in order to prevent split brain. By design, DAC mode may be the cause of an outage if it is not implemented correctly. If properly implemented, however, it will act as an extra layer of quorum against split brain.

If WAN links are unreliable, and your DAG appears similar to Figure 6, consider planning your DAGs without DAC mode, as per Figure 7.

FIGURE 6 Single DAG with DAC mode

images

DAGs may be split into two or more DAGs with either datacenter maintaining quorum if a WAN failure occurs, similar to what appears in Figure 7.

FIGURE 7 Multiple DAGs without DAC mode

images

Others

- Exchange Server 2013 : Building an Available Messaging System (part 1) - Transport, Namespace Planning

- Sharepoint 2013 : Managing and Configuring Profile Synchronization (part 9) - Audiences - Audience Targeting Rules and Logic, Targeting Content to Audiences

- Sharepoint 2013 : Managing and Configuring Profile Synchronization (part 8) - Audiences - Configuring Audiences

- Sharepoint 2013 : Managing and Configuring Profile Synchronization (part 7) - SharePoint Profile Synchronization - Managing User Profiles, Managing Policies

- Sharepoint 2013 : Managing and Configuring Profile Synchronization (part 6) - SharePoint Profile Synchronization - Managing User Properties

- Sharepoint 2013 : Managing and Configuring Profile Synchronization (part 5) - SharePoint Profile Synchronization - Editing Connection Filters

- Sharepoint 2013 : Managing and Configuring Profile Synchronization (part 4) - Configuring the Synchronization Connection

- Sharepoint 2013 : Managing and Configuring Profile Synchronization (part 3) - SharePoint Profile Synchronization

- Sharepoint 2013 : Managing and Configuring Profile Synchronization (part 2) - Active Directory Import

- Sharepoint 2013 : Managing and Configuring Profile Synchronization (part 1) - Choosing a Synchronization Method