What's New in Microsoft Lync Server 2013 : High-Availability and Disaster Recovery Changes

10/28/2013 9:17:31 PM

SQL Mirroring for Databases

The one weak spot in any Office Communications Server 2007 R2 or Lync Server 2010 deployment was the backend SQL database for each Front End pool. This was traditionally a SQL server cluster hosted by multiple nodes, but the storage for the database had to be a single SAN. Administrators were anxious to remove this single point of failure by leveraging a feature in SQL called database mirroring, in which the databases are kept in sync between two separate nodes each with their own storage. It was technically possible to configure SQL mirroring for the database, but there was no automated or easy way to fail over between the mirrored nodes. More importantly, it wasn’t ever a scenario supported by Microsoft.

Lync Server 2013 has finally introduced support for SQL server mirroring of the backend database, which allows administrators to remove the dependency on a SAN. This helps reduce the complexity and overall cost of any highly available deployment since Windows and SQL server clustering is no longer required. The resiliency of the solution is also improved since there are now two unique copies of the user data within the backend database.

Automatic failover between the mirrored SQL server nodes can be enabled when a SQL server mirroring witness server is deployed. The witness server acts as the third vote for which node should actively serve the databases, similar to a quorum disk in a cluster or a file-share witness in an Exchange Database Availability Group (DAG). The Express Edition of SQL Server can be used for the witness server to save on licensing costs, but regardless of whether the full or the Express Edition is used, it must match the major product version used for the backend nodes.

Brick Model and User Data Replication

Another welcome change in Lync Server 2013 is the addition of the Brick Model within Front End pools. One of the investments made was to reduce the dependency on the backend SQL server database’s availability so that users should be unaware if database issues are occurring. The Brick Model moves more functionality into each Front End Server, which now manages the user’s presence states directly. Changes to the backend database are done only for persistent data and are considered lazy writes, so a temporary issue at the database doesn’t have a high impact on users.

Users are now automatically partitioned into objects called UserGroups within a Front End pool, similar to how each user had a preferred server order for a Lync Server 2010 pool. Each UserGroup is then assigned up to three Front End Servers within a pool, so there are up to three copies of the user’s data stored directly on the pool members. If a UserGroup’s primary Front End Server fails, the secondary or tertiary can immediately begin servicing that UserGroup because they already have the data stored.

These Brick Model changes also allow pools to scale out to larger numbers. Twelve Front End Servers can be deployed within a single Lync Server 2013 pool, up from only 10 in Lync Server 2010. This still only allows a single pool to now support up to 80,000 concurrent users, but allows for full operating capacity even after two Front End server failures.

Conferencing Resiliency and Backup Service

Lync Server 2010 introduced resiliency for the voice platform through Survivable Branch Appliances/Servers and the concept of primary and backup registrars. This allowed users to retain basic voice services during an outage, but all contact list and conferencing capabilities were lost until an administrator intervened to forcefully move users and restore information. Lync Server 2013’s focus around conferencing resiliency automates this process for administrators by replicating conference data between each Front End pool member for both the primary and the backup registrar.

This resiliency is achieved through a new Lync Backup Service that replicates user and conferencing data between paired Front End pools on a continuous basis. The advantage to this can be seen when an organization has two locations, each with an active pool paired to the opposite location. During a server outage, full conferencing functionality can be provided through the opposite site. Failover between the two pools is not automated and must be initiated by an administrator through a set of Lync Management Shell commands. The combination of SQL Mirroring and the Lync Backup Service allows organizations to achieve a new degree of local and remote resiliency, as shown in Figure 1.

Figure 1. Local and remote resiliency.

One negative change from Lync Server 2010 is that Front End pools can be paired only with the same product version, meaning an Enterprise Edition pool is not recommended to be paired with a Standard Edition pool. Additionally, pool pairings are now a 1:1 ratio so multiple Front End pools can no longer be paired with a single disaster recovery site’s Front End pool. A unique disaster recovery pool must be deployed for each primary pool, which means an even number of Front End pools are required for pairing.

Additionally, the conferencing resiliency benefits only users homed directly to the Front End pool. Users hosted on a Survivable Branch Appliance or Server still use a single Front End pool for voice resiliency, but do not gain resiliency for their conferencing.

Others