SQL Mirroring for Databases
The one weak spot in any Office
Communications Server 2007 R2 or Lync Server 2010 deployment was the
backend SQL database for each Front End pool. This was traditionally a
SQL server cluster hosted by multiple nodes, but the storage for the
database had to be a single SAN. Administrators were anxious to remove
this single point of failure by leveraging a feature in SQL called
database mirroring, in which the databases are kept in sync between
two separate nodes each with their own storage. It was technically
possible to configure SQL mirroring for the database, but there was no
automated or easy way to fail over between the mirrored nodes. More
importantly, it wasn’t ever a scenario supported by Microsoft.
Lync Server 2013 has finally introduced
support for SQL server mirroring of the backend database, which allows
administrators to remove the dependency on a SAN. This helps reduce the
complexity and overall cost of any highly available deployment since
Windows and SQL server clustering is no longer required. The resiliency
of the solution is also improved since there are now two unique copies
of the user data within the backend database.
Automatic failover between the
mirrored SQL server nodes can be enabled when a SQL server mirroring
witness server is deployed. The witness server acts as the third vote
for which node should actively serve the databases, similar to a quorum
disk in a cluster or a file-share witness in an Exchange Database
Availability Group (DAG). The Express Edition of SQL Server can be used
for the witness server to save on licensing costs, but regardless of
whether the full or the Express Edition is used, it must match the
major product version used for the backend nodes.
Brick Model and User Data Replication
Another welcome change in Lync Server 2013 is
the addition of the Brick Model within Front End pools. One of the
investments made was to reduce the dependency on the backend SQL server
database’s availability so that users should be unaware if database
issues are occurring. The Brick Model moves more functionality into
each Front End Server, which now manages the user’s presence states
directly. Changes to the backend database are done only for persistent
data and are considered lazy writes, so a temporary issue at the
database doesn’t have a high impact on users.
Users are now automatically partitioned into
objects called UserGroups within a Front End pool, similar to how each
user had a preferred server order for a Lync Server 2010 pool. Each
UserGroup is then assigned up to three Front End Servers within a pool,
so there are up to three copies of the user’s data stored directly on
the pool members. If a UserGroup’s primary Front End Server fails, the
secondary or tertiary can immediately begin servicing that UserGroup
because they already have the data stored.
These Brick Model changes also allow pools to
scale out to larger numbers. Twelve Front End Servers can be deployed
within a single Lync Server 2013 pool, up from only 10 in Lync Server
2010. This still only allows a single pool to now support up to 80,000
concurrent users, but allows for full operating capacity even after two
Front End server failures.
Conferencing Resiliency and Backup Service
Lync Server 2010 introduced resiliency for
the voice platform through Survivable Branch Appliances/Servers and the
concept of primary and backup registrars. This allowed users to retain
basic voice services during an outage, but all contact list and
conferencing capabilities were lost until an administrator intervened
to forcefully move users and restore information. Lync Server 2013’s
focus around conferencing resiliency automates this process for
administrators by replicating conference data between each Front End
pool member for both the primary and the backup registrar.
This resiliency is achieved through a new
Lync Backup Service that replicates user and conferencing data between
paired Front End pools on a continuous basis. The advantage to this can
be seen when an organization has two locations, each with an active
pool paired to the opposite location. During a server outage, full
conferencing functionality can be provided through the opposite site.
Failover between the two pools is not automated and must be initiated
by an administrator through a set of Lync Management Shell commands.
The combination of SQL Mirroring and the Lync Backup Service allows
organizations to achieve a new degree of local and remote resiliency,
as shown in Figure 1.
Figure 1. Local and remote resiliency.
One negative change from Lync Server 2010 is
that Front End pools can be paired only with the same product version,
meaning an Enterprise Edition pool is not recommended to be paired with
a Standard Edition pool. Additionally, pool pairings are now a 1:1
ratio so multiple Front End pools can no longer be paired with a single
disaster recovery site’s Front End pool. A unique disaster recovery
pool must be deployed for each primary pool, which means an even number
of Front End pools are required for pairing.
Additionally, the conferencing resiliency
benefits only users homed directly to the Front End pool. Users hosted
on a Survivable Branch Appliance or Server still use a single Front End
pool for voice resiliency, but do not gain resiliency for their
conferencing.