Best Practices for Performing Database Maintenance
The
Exchange Server storage system is a database and requires routine
maintenance to perform efficiently and prevent failures. Exchange
Server 2013 fully automates the routine maintenance tasks of
defragmentation and compaction. Exchange Server 2013 has advanced the
health of the messaging system through the introduction of the
following:
• Continuous online database defragmentation
• Continuous online database compaction
• Continuous online database contiguity maintenance
These features eliminate any necessity for planned downtime to perform database maintenance.
As
messaging environments have evolved from “nice to have” to “business
critical,” database maintenance has evolved from “should be done” to
“must be done.” Potential causes of database corruption include the
following:
• Improper shutting down of the system, including unexpected power outages
• A poorly maintained disk subsystem
• Hardware failures
• Failure to use or review systems or operational management tools
• Manual modification of Exchange Server databases
Automatic Database Maintenance
Exchange
Server 2013 automatically performs database maintenance procedures on a
nightly basis during the scheduled maintenance window. Exchange Server
2013 performs two distinct activities: Online Maintenance (OLM) and
Online Defragmentation (OLD). OLM starts by default at 1:00 a.m. every
day, whereas OLD is continuous.
Note
This is different than in Exchange Server
2007, in which OLD ran during the OLM process, resulting in potentially
poor performance during maintenance windows and the requirement to
stagger maintenance schedules.
The following tasks are automatically performed by these processes (OLM and OLD):
• Cleanup of deleted items and mailboxes—Cleanup also happens during OLM. Cleanup is performed at runtime when hard deletes occur.
• Space compaction—The
database is compacted and space is reclaimed at runtime. Automatically
throttling performance demands avoids performance impact on end users.
• Maintain contiguity—The
database is analyzed for contiguity and space at runtime and is
defragmented in the background. Automatically throttling performance
demands avoids performance impact on end users. The contiguity
maintenance is integrated to Exchange Server 2013 and improves
performance significantly.
• Database checksum—There
are two options for the database checksum tasks, either run the
background 24×7 (the default) or run during the OLM window. In both
cases, the task runs against both active and passive copies of the
database.
By default, the OLM maintenance
schedule is set to run daily from 1:00 a.m. to 5:00 a.m. Because the
maintenance cycle can be extremely resource intensive, this default
schedule is intended to perform the maintenance during periods when
most of an organization’s mail users are not connected. Organizations
might find the need to adjust these schedules when there are users
connecting from other parts of the world or when there are 24-hour
operations. Organizations should also take their Exchange Server backup
schedules into consideration.
The OLD task runs continuously but is auto-throttled to prevent impact to the end user.
Taken
together, the automatic maintenance regime is much more effective at
keeping the database healthy and performing. In particular, the
contiguity maintenance of Exchange Server 2013 reduces the I/O of the
database immensely.
Prioritizing and Scheduling Maintenance Best Practices
Exchange
Server 2013 is a very efficient messaging system. However, as mailboxes
and public folders are used, there is always the possibility of the
logical corruption of data contained within the databases. It is
important to implement a maintenance plan and schedule to minimize the
impact that database corruption will have on the overall messaging
system.
This section focuses on tasks that
should be performed regularly—on a daily, weekly, monthly, and
quarterly schedule. Besides ensuring optimum health for an
organization, following these best practices will have the additional
benefit of ensuring that administrators are well informed about the
status of their messaging environments.
Tip
Administrators should thoroughly document the
Exchange Server 2013 messaging environment configuration and keep it up
to date. In addition, a change log should be implemented that is used
to document changes and maintenance procedures for the environment.
This change log should be meticulously maintained.
Daily Maintenance
Daily
maintenance routines require the most frequent attention of an Exchange
Server administrator. However, these tasks should not take a
significant amount of time to perform.
Verify the Online Backup
One
of the key differences between disaster and disaster recovery is the
ability for an organization to resort to backups of its environment if
the need arises. Considering the potential impact to an environment if
the data backed up is not recoverable, it is amazing to see how often
backup processes are ignored. Many organizations implement a “set it
and forget it” attitude, often relying on nontechnical administrative
personnel to simply “swap tapes” on a daily basis.
Note
A “backup” in Exchange Server 2013 does not
necessarily imply solely a backup to tape media as a backup would have
been known by years ago. With Exchange Server 2013, a “backup” may be a
replication of the database to another server, so the verification of
the backup will be to confirm that the data has successfully replicated
and is up to date on the secondary server.
Whatever
method is used to back up an Exchange Server environment, daily
confirmation of the success of the task should be mandatory. Although
the actual verification process will vary based on the backup solution
being utilized, the general concept remains the same. Review the backup
program’s log file to determine whether the backup has successfully
completed. If there are errors reported or the backup job set does not
complete successfully, identify the cause of the error and take the
appropriate action to resolve the problem.
Some best practices to keep in mind when backing up an Exchange Server environment are as follows:
•
Keep note of how long the backup process is taking to complete. This
time should match any service level agreements that might be in place.
•
Determine the start and finish times of the backup process. Attempt to
configure the environment so that the backup process completes before
the nightly maintenance schedule begins.
• Verify that transaction logs are successfully truncated upon completion of the backup.
Check Free Disk Space
All
volumes that Exchange Server 2013 resides on (Exchange Server system
files, databases, transaction logs, and so forth) should be checked on
a daily basis to ensure that ample free space is available. If the
volume or partition runs out of disk space, no more information can be
written to the disk, which causes Exchange Server to stop the Exchange
Server services. This can also result in lost data and the corruption
of messaging databases.
Although it is
possible to perform this process manually, it is easily overlooked when
“hot” issues arise. As a best practice, administrators can utilize
System Center 2012 Operations Manager (OpsMgr) or a third-party product
to alert administrators if free space dips below a certain threshold.
For
organizations without the resources to implement such products, the
process can be accomplished utilizing scripting technologies, with an
email or network alert being generated when the free space falls below
the designated threshold.
Review Message Queues
Message
queues should be checked daily to ensure that the mail flow in the
organization is not experiencing difficulties. The Queue Viewer in the
Exchange Toolbox can be accomplished for this task.
If
messages are found stuck in the queue, administrators can utilize the
Message Tracking and Mail Flow Troubleshooter to determine the cause.
Check Event Viewer Logs
On
Exchange Server 2013 servers, the application log within the Event
Viewer should be reviewed daily for any warning or error level
messages. Although some error messages might lead
directly to a problem on the server, some might be symptomatic of other
issues in the environment. Either way, it is best to evaluate and
resolve these errors as soon as possible.
Filtering for these event types can assist with determining if any have occurred within the last 24 hours.
Alternatively,
if a systems or operational management solution (such as System Center
2012 Operations Manager) is utilized, this process can be automated,
with email or network notifications sent as soon as the error is
generated.
Verify Database Replication
Exchange
Server 2013 leverages database replication for both redundancy and high
availability, as such, verifying that database replication is occurring
in the manner that the organization has set and expects replication to
be working is critical.
In environments
that have multiple DAG copies, administrators should ensure that the
copy and replay queues are near zero, or at least not growing.