Backup and recovery, high availability, disaster recovery, and compliance and governance. You may have heard of these once or twice; each plays a role in the overall protection strategy of your organization's data.
All four of these topics must be at least considered
by every modern Exchange administrator and professional, even if they
are not actively addressed in each deployment of Exchange 2010. Even
when you do need to address them in your planning, Exchange 2010
provides a variety of options to ensure that your deployment meets your
own particular needs and situation. One size does not fit all. To best
use the tools that Exchange gives you, though, you must clearly
understand the problems they are designed to solve. It doesn't do to
use a screwdriver as a hammer — and you can't solve a disaster recovery
problem by using the wrong continuous replication option.
1. Backup and Recovery
Let's start with a topic that is arguably one of the
core tasks for any IT administrator, let alone Exchange administrators:
backup and recovery.
Backup is the process of preserving one or more
point-in-time copies of a set of data, regardless of the number of
copies, frequency and schedule, or media type used to store them. In
Exchange backups, there are four main types:
Full Backups
Full backups capture an entire set of target
data; in legacy versions of Exchange, this is a storage group with the
transaction log files and all the associated mailbox databases and
files. In Exchange 2010, each mailbox database is a separate backup
target, since there is now an enforced 1:1 relationship between mailbox
databases and transaction logs (it was merely strongly recommended in
previous versions). Full backups take the most time to perform and use
the most space, but they must be regularly performed on Exchange
mailbox databases so that the Exchange Information Store knows that
transaction logs have been preserved and can be safely deleted.
Incremental Backups
Incremental backups capture only a partial set
of the target data — specifically, the data that has changed since
either the last full backup or the last incremental backup. For
Exchange, this means any new transaction logs. Incremental backups are
designed to minimize how often you have to perform full backups as well as minimize the space used
by any particular backup set. As a result, a backup set that includes
incremental backups can be more time-consuming and fragile to restore;
successful recovery includes first recovering the latest full backup,
then each successive incremental backup. Incremental backups also
instruct Exchange Server to purge the transaction logs after they are
backed up.
Differential Backups
Differential backups also capture only a partial
set of the target data — specifically, the data that has changed since
the last full backup. All other backups (incremental and differential)
are not considered. For Exchange, this means any transaction logs
generated since the last full backup plus a new copy of the mailbox
database files. Differential backups are designed to minimize how many
recovery operations you have to perform in order to fully restore a set
of data. In turn, differential backups use more space than incremental
backups, but they can be recovered more quickly and with fewer
opportunities for data corruption; successful recovery includes first
recovering the latest full backup, then the latest differential backup.
Recovery
Also known as restoration, recovery is the
process of taking one or more sets of the data preserved through
backups and making it once again accessible to administrators,
applications, or end users. Most recovery jobs require the restoration
of multiple sets of backup data, especially when incremental and
differential backups are in use. Two metrics are used to determine if
the recovery time and the amount of data recovered are acceptable:
Recovery Time Objective (RTO)
RTO is a metric commonly used to help define
successful backup and restore processes. The RTO defines the time
window in which you have to restore Exchange services and messaging
data after an event. You may have multiple tiers of data and service,
in which case it could be appropriate to have a separate RTO for each
tier. Often, the RTO is a component of (ideally, an input into, but
that's not always the case) your service level agreements. As a result,
the RTO is a critical factor in the design of Exchange mailbox database
storage systems; it's a bad idea to design mailbox databases that are
larger than you can restore within your RTO.
Recovery Point Objective (RPO)
RPO is a metric that goes hand in hand with the
RTO. While the RTO measures a timeframe, the RPO sets a benchmark for
the maximum amount of data (typically measured in hours) you can afford
to lose. Again, multiple tiers of service and data often have separate
RPOs. The RPO helps drive the backup frequency and schedule. It's worth
noting that this metric makes an explicit assumption that all data
within a given category is equally valuable; that's obviously not true,
which is why it is important to properly establish your categories.
Remember, though, if you have too many classes or categories, you'll
just have confusion.
One thing to note about Exchange 2010 is that it
only supports online backups and restores created through the Windows
Volume Shadow Copy Service (VSS). While previous versions allowed the
use of an online streaming backup, this option is no longer available.
VSS provides several advantages, including the ability to integrate
with third-party storage systems to speed up the backup and recovery
process. The most important benefit VSS gives, though, is that it
ensures that the Exchange Information Store flushes all pending writes
consistently, ensuring the backup data set can be cleanly recovered.
One thing that Volume Shadow Copy Service (VSS) does
not natively provide is the ability to reduce the amount of data that
must be copied during a backup operation. VSS simply creates either a
permanent or temporary replica (depending on how the invoking
application requested the replica be created) of the entire disk
volume; it's then up to the application to sort out the appropriate
files and folders that make up the data set. Many Exchange-aware backup
applications simply copy the various transaction log files and mailbox
database files to the backup server.
Some applications, however, are a bit more
intelligent; they keep track of which blocks have changed in the target
files since the last backup interval. These applications can copy just
those changed blocks to the backup data set — typically some percentage
of the blocks in the mailbox database file as well as all the new
transaction log files — thus reducing the amount of data that needs to
travel over the network and be stored. Block-level backups help strike
a good balance between storage, speed, and reliability. As you go
forward with VSS-aware Exchange-compatible backup solutions, be sure to
investigate whether they offer this feature.
|
2. Disaster Recovery
Regular backups are important; the ability to
successfully restore them is even better. This capability is a key part
of your extended arsenal for problem situations. Restoring the
occasional backup is fairly straightforward but assumes that you have a
functional Exchange server and dependent network infrastructures. What
do you do if an entire site or datacenter goes down and your recovery
operations extend beyond just an Exchange mailbox database? The answer
to this question is a broad topic that can fill a large number of
books, blog postings, and web-sites of its own.
Disaster recovery (DR)
is the practice of ensuring that you can restore critical services when
some disaster or event causes large-scale or long-term outage. A
successful DR plan requires you to identify your critical services and
data, create documentation that lists the necessary tasks to re-create
and restore them, and modify the suitable policies and processes within
your organization to support your plan.
It's not enough to consider how to rebuild Exchange
servers and restore Exchange mailbox databases. Exchange is a
complicated application with a large number of dependencies, so your
plans need to accommodate the following issues:
Network Dependencies
This topic includes subnets, IP address
assignments, DHCP services, and router configurations. Are you
rebuilding your services to have the same IP addresses or new ones?
Whatever you decide, you'll need to make sure that other services and
clients can reach the Exchange servers.
Active Directory Services
This topic includes associated DNS zones and
records. Exchange cannot function without reliable access to global
catalog servers and domain controllers. Which forests and domains hold
objects Exchange will need to reference? Does your existing replication
configuration meet those needs?
Third-Party Applications
This topic includes monitoring, backup,
archival, or other programs and services that require messaging
services or interact with them. Don't just blindly catalog everything
in production; be sure these systems are also being provided as part of
the disaster recovery plan.
There's a blurry line between disaster recovery and the associated concept of business continuity (also called business continuance
by some). Business continuity (BC) is the ability of your organization
to continue providing at least the minimum set of operations and
services necessary to stay in business during a large-scale outage,
such as during a regional event or natural disaster. In a business
continuity plan, you will identify and prioritize the most critical
services and capabilities for which you need to provide at least some
level of operational capacity as soon as possible, even without full
access to data or applications.
It's important to note that the business continuity
plan is designed and implemented alongside your disaster recovery
efforts. In many organizations, they will be maintained by two separate
groups of professionals; it goes without saying that these groups
should have good lines of communication in place.
There's a lot of confusion over exactly how disaster
recovery and business continuity relate to each other. We have good
news and bad news: the good news is that it's a simple relationship.
The bad news is, "It depends."
Both types of plans are ultimately aimed at the goal
of repairing the damage caused by extended outages. The biggest
difference is the scope; many business continuity plans focus very
little on technology and look instead at overall business processes. In
contrast, disaster recovery plans of necessity have to be concerned
with the finer details of IT administration. The reality is that both
levels of focus are often needed — and must be handled in parallel,
with coordination, and in support of any additional ongoing crisis
management.
Let's try to clarify the difference by providing an
example. Acme Inc. is a national manufacturer and supplier of various
goods, mainly to wholesale distributors but with a small and thriving
mail-order retail department for the occasional customer who needs
quality Acme products but has no convenient retail outlet in their
locale. Acme's main call center has a small number of permanent staff
but a large number of contract call center operators.
Unfortunately, Acme's main order fulfillment center
— for both bulk wholesale orders as well as the relatively small amount
of mail order traffic — gets hit by a large fragment in a meteor
shower, causing a fire that rapidly transforms the entire site into
smoking rubble even as all personnel are safely evacuated. The call
center and supporting datacenter are completely destroyed and,
conservatively, will take several months to fully rebuild. Obviously,
Acme is going to suffer some sort of setback, but with proper planning
they can minimize the effects. What types of actions would Acme's BC
and DR plans each be taking?
Acme's BC plan is concerned with getting the
minimum level of operational function back online as quickly as
possible. In this case, it's going to take a while before they can
resume call center operations. Their immediate needs are to establish
at least some level of messaging support for the temporary call center
workers the BC plan brings in. Their BC plan does not assume that they
will have in-house capability, so makes provisions — if required — to
use hosted Exchange services as a short-term stopgap so that
communications with customers and wholesalers will proceed until Acme's
IT staff can bring up sufficient Exchange servers to switch back to
on-premise services.
Acme's DR
plan is concerned with rebuilding critical structures. In addition to
restoring critical network infrastructure services, Acme's Exchange
administrators are tasked with first rebuilding sufficient Exchange
servers in their DR location to recover the mailbox databases for the
call center's permanent staff. They also need to then create sufficient
Exchange servers to allow the recovery of operator mailbox databases to
extract message data pertaining to currently open cases that need
investigation. Once the datacenter is rebuilt, they can build the rest
of the Exchange servers and restore operations from the DR site.