1. New High-availability and Recovery Features
Windows Server 2008 R2 includes several features to
further enhance HA and backup services. These include new features such
as PowerShell support for clustering and the ability to backup
individual files and folders with Windows Backup.
Failover Cluster PowerShell support
Failover Clusters can now be set up and administered
using PowerShell 2.0. This includes not only the new cmdlets for
Failover Clustering but also the ability to remotely send commands to
cluster services via PowerShell 2.0. With the added support for
PowerShell, the cluster.exe command line utility is being deemphasized
and may not be available in future releases of Windows.
Cluster-Shared Volumes
Failover
Clustering supports the use of Cluster-Shared Volumes (CSVs). These are
volumes that can be accessed by multiple nodes of the cluster at the
same time. This brings new benefits to Hyper-V deployments by providing
Live Migration and reduced number of LUNs required. Live Migration
allows you to move virtual machines between two hosts in a Failover
Cluster with no downtime. CSV make this process possible.
Since previous versions of Windows could only have
one host actively accessing the LUN, a failover would cause all VMs
stored on a LUN to failover. Prior to Windows Server 2008 R2, Microsoft
recommended that each VM in a Failover Cluster be assigned its own LUN
to ensure that a single VM could fail over. For many deployments, this
resulted in a lot of LUNs being assigned to each Hyper-V Host. Windows
Server 2008 R2 removes this restriction using CSV allowing both hosts
to access the volume at the same time, enabling a single VM on a LUN to
fail over without requiring over VMs on that same LUN to do the same.
Improved Cluster Validation
Windows Server 2008 introduced the Cluster
Validation Wizard. By using this wizard, administrators could easily
verify and set up a cluster ensuring that it was in a supported
configuration. If the cluster passed the validation wizard, it was
considered to be in a correct configuration. Windows Server 2008 R2
adds additional tests to further ensure that a cluster can be validated
using the Cluster Validation Wizard.
Support for additional cluster aware services
The Remote Desktop Connection Broker and DFS
Replication (DFSR) can both be configured on a Failover Cluster to
provide HA and redundancy to these services.
Ability to backup individual files and folders
Windows Server 2008 R1 (RTM) backup did not
have the ability to select individual files and folders to be backed
up. This was a feature offered in previous versions of Windows such as
Windows Server 2003. Windows Server 2008 R1, however, provided the
ability to backup only a full volume. Windows Server 2008 R2 has
brought back the feature to allow administrators to selectively choose
which files and folders to include in a backup set.
2. Planning for High Availability
Deploying
HA features on your network requires adequate planning and testing
prior to production use of the solution. One of the first planning
steps you should perform is to determine what the expected uptime
requirements are for the system. You may find out that the actual
business need for the system does not even require HA features. This
all depends on how long it takes to restore the system and how long the
organization can work without the system being online. This needs to be
reviewed from a business standpoint and should have buy-in from those
in charge of the business process that is supported by a particular
system. Additionally, you will need to determine whether the particular
system is supported using Windows Server 2008 R2 HA features. For
example, Microsoft SQL Server is a cluster aware application, and
therefore can be supported using Failover Clustering features. IIS web
servers can be configured using NLB features. A third-party database
server may not be cluster aware, and therefore you may not be able to
provide an HA solution for that application using Windows Server 2008
R2 Failover Clustering. There are Generic Application and Generic
Service options for setting up applications and services that are not
cluster aware. These, however, provide only basic Failover Clustering
features. This allows you to set up HA services for the following
standard Microsoft applications and services are cluster-aware meaning
that they can be deployed on a Windows Server 2008 R2 Failover Cluster
to provide HA:
Microsoft SQL Server
Microsoft Exchange Server
DHCP Server
File Server
DFS Server
Distributed Transaction Coordinator
iSNS Server
Message Queuing
Print Server
Remote Desktop Connection Broker
Hyper-V Host
WINS
After you have determined that HA features are
required and that they can be supported by Windows Server 2008 R2
Failover Clustering or NLB, you can begin planning your HA solution.
Understanding how Failover Clustering works
As
you previously learned, Windows Failover Clusters provide HA by
deploying multiple servers in a cluster. The cluster hides the fact
that multiple servers are deployed meaning that client computers see
all servers in the cluster as a single server. Each server in the
cluster is referred to as a node. Windows clustering uses an
active/passive concept to support HA services. This means that active
nodes are online and performing all processing requested by the
installed application. In the event that the active node fails, the
cluster fails-over to the passive node when then becomes active. The
new active node continues to handle processing of the application.
Cluster nodes use heartbeat and quorum to determine
which node is online and active and to initiate a failover in the event
of a node failure. The heartbeat is used to determine whether nodes of
the cluster are online. Each node communicates over the heartbeat
network continuously to determine whether the other nodes are online.
If an active node fails to return a heartbeat request, the cluster will
fail over to a passive node. Quorum is used to ensure that the cluster
can continue to function and nodes can recover in the event of failure.
The quorum also helps ensure that clusters do not experience “split
brain” which is where an active and passive node both believe they
should be the active node. For a node in an active/passive cluster to
become active, it must be able to communicate with the quorum. If a
node cannot communicate with the quorum, it cannot become active.
Windows Server 2008 R2 allows you to use a quorum disk or a file share,
known as a file share witness. Failover Clusters can use any of the
following quorum configurations:
Node Majority—This quorum setting is used
when there are an odd number of nodes in the cluster. This ensures that
a cluster can tolerate failure of half of the nodes (rounded up) minus
one.
Node and Disk Majority—This quorum
setting should be used for clusters with an even number of nodes. Using
this setting, the cluster can tolerate failure of half of the nodes
(rounded up) if the quorum disk remains online. If the quorum disk goes
offline, the cluster can tolerate half of the nodes (rounded up) minus
one. For example, if a four-node cluster can remain online if two nodes
fail and the disk quorum remains online or if one node fails and the
quorum disk fails.
Node and File Share
Majority—This quorum setting is used for clusters that require special
configuration using a file share instead of a quorum disk. For example, Exchange Server 2007 Continuous Cluster Replication (CCR) uses a file share witness.
No
Majority-Disk Only—This setting is not recommended but using this
quorum setting allows the cluster to tolerate failure of all nodes as
long as the quorum disk remains online.
We will explore setting up quorum later in this chapter when we discuss administering Failover Clusters.
Planning for a Failover Cluster
When planning to implement a Failover Cluster, you need to answer the following preliminary questions:
How many node (server) failures should be
tolerated? Windows Server 2008 R2 Failover Clusters can be configured
with multiple nodes. For example, you could provide an active node with
two passive nodes. In the event that the active node failed, one of the
passive nodes would become active. In the event that the second node
failed, the other passive node would then become active. This allows a
Failover Cluster to support failure of multiple nodes.
Does
the cluster need to support geographic resiliency? With the release of
Windows Server 2008 R1, Failover Clusters now have the ability to span
a wide area network. Using a geo-cluster, you can have an active node
in one datacenter and the failover node in a datacenter in another
geographic location. In the event of complete datacenter loss, the
cluster could fail over the node in the second datacenter. Figure 1 depicts a Windows Server 2008 R2 geo-cluster.
Can the application sustain the
brief time required to fail over to another node in the cluster?
Failover Clusters require a very brief period of time to fail over in
the event of a node going offline. You will want to ensure that your
system, including front-end applications, can support this very brief
outage. For example, you may deploy SQL server on a Failover Cluster.
During the failover process, there will be a very brief period of time
where the front-end application cannot talk to the SQL back-end. You
need to verify whether the application can easily reconnect after the
brief outage occurs. This outage is usually just a few seconds.
How
will you be notified in the event of a node failure? The beauty of
Failover Clustering is that the application remains online when a
server fails. However, what if you as the administrator are not aware
that a failure has occurred. The application is still online after all;
thus, helpdesk
phone lines probably are not ringing. This does not negate the fact
that you need to know that a node has gone down and the cluster has
failed-over. You need to be able to troubleshoot and resolve the issue
that caused the failover to begin with. You also need to restore
failover capabilities; otherwise, a second node failure could cause a
service outage depending on how many failover nodes are available.