SharePoint 2010: Data Protection, Recoverability, and Availability - Disaster Recovery Planning

11/26/2012 5:46:32 PM

Developing and implementing a SharePoint disaster recovery plan is not easy, because there are so many components integrated with SharePoint. A medium-scale or larger SharePoint Server installation has many infrastructure dependencies as well as core components like Web front-ends servers, search servers, and database servers. Nearly all of your SharePoint information is stored in SQL Server databases, but there are several other areas of concern when developing a disaster recovery plan for SharePoint. For instance, you have information stored in other applications, such as IIS and DNS, and even at the file system level. You also have to be concerned about hardware—hard drives, routers, switches, cables, and so on.

Disaster planning should also encompass the implementation of best practices to avoid or minimize the chance of a catastrophic event occurring in the first place. If you take the time to plan carefully, using a three-step process involving education, documentation, and preparation, you can build a comprehensive—and successful—disaster recovery plan that will benefit your organization in the short term and the long run.

1. Education

The education phase of disaster recovery planning involves the process of familiarizing yourself with all the integrated SharePoint components, so you know what you will need to do in the recovery process to minimize the disruption of your business infrastructure. Not all of the components you must be concerned with are contained within SharePoint, but because SharePoint is integrated so tightly with so many other applications, it is dependent on many of them to function.

1.1. Server Operating System

The most obvious component that SharePoint 2010 depends on is the Windows Server operating system. You should create a new operating system image of all non-database servers, and the image should contain all the service packs and patches that you have applied. You can use this image to quickly restore the operating system before SharePoint is reinstalled. However, be cautious; you should keep a different image for each farm server role, because changing the SID (Security Identifier) of a single image to create multiple SharePoint 2010 WFEs and application servers is not a recommended practice. Be sure to update your images every time you add a service pack or patch and when the SharePoint Root changes.

Note:

The SharePoint Root is located at C:\Program Files\Common Files\Microsoft Shared\web server extensions\14\ and replaces the phrase 14 Hive.

You should create a network drive with all of your system images, installation sources, patches, and third-party software additions. Schedule backups for this drive at least once a week. This practice will allow you to rapidly restore the server while retaining your SharePoint 2010 farm consistency.

1.2. SQL Server

If you could back up only one server in your SharePoint 2010 farm, it would have to be your SQL Server. SQL Server contains more than 95 percent of your SharePoint information. SQL Server stores configuration information about your entire farm, the site collection content of your Web applications, your Web application settings, service application information, performance information, and several other important bits of SharePoint information.

If you aren’t also the SQL Server database administrator (DBA), you should introduce yourself to the database administrator(s) who are managing the SQL Server instance or instances that are hosting your SharePoint content. Take the time to become familiar with their schedules, backup strategies, database failover options, and anything else they are willing to share with you about the SharePoint databases.

1.3. Internet Information Services

All SharePoint content is accessed through a Web service hosted by Internet Information Services (IIS). The configuration of your Web applications and application pools made from SharePoint are stored in the farm configuration database. However, any changes you make directly in the IIS Manager are not stored in the SharePoint farm configuration database; they are stored in an IIS configuration file. For instance, if you add an additional host header to a Web application using IIS Manager, it is not stored in the farm configuration database—it is stored in an IIS 7 configuration file.

The foundation of your Web application information stored in IIS is the configuration file. This is a repository for your IIS configuration information located in the directory C:\Windows\System32\inetsrv\config. This IIS configuration file is an XML file called Applicationhost.config, and you should update it only by using the IIS Manager application or the Appcmd.exe command-line tool. You should back up your IIS configuration file regularly so that you have an up-to-date version if you lose the IIS server hosting your SharePoint Web applications.

1.4. Third-Party Software

Most organizations have third-party solutions running on their SharePoint 2010 server farms. This might include backup software, Web parts, language packs, antivirus software, and custom code. Become familiar with this software and document how it is installed. Document any installation keys that are required and keep the installation media in a central location that is easily accessible during a recovery process. Be sure to reinstall any third-party Web Parts and custom code before redeploying your Web front-end (WFE) servers. Forgetting to do so on a load-balanced WFE will result in page errors and an inconsistent experience for the end user. As part of your disaster recovery planning, you should be cautious about installing products that extend the time required to recover your farm. Make third-party solutions dynamic enough to restore your farm with minimal delays.

1.5. Network Components

Since SharePoint 2010 hosts its content through a Web service and is network dependent, being familiar with all of the connection components is crucial to recovery or continuity of services. Be sure to include your network team in your disaster recovery planning process at an early stage to discuss and document all connecting pieces. The following list provides some examples of components you should discuss with your network team.

Switches Redundancy, virtual LANS, Network Interface Card (NIC) teaming, port speed, duplex, dedicated backup LANs
Routers Redundant paths, latency, hardware load balancing
Firewalls Rules, redundancy, OS version
SAN (Storage Area Network) Compatibility, capacity, speed, Host Bus Adapter
Cabling and electrical topology Redundant cabling, processes for working in your raised floor, redundant power, uninterruptible power supplies, generators

Note:

If you are using an Internet Service Provider (ISP), be sure to get a service level agreement (SLA) that defines their strategies and obligations regarding the services they provide.

1.6. Central Administration

With the exception of the SQL Server and the operating system, the server hosting the Central Administration Web application is the most important component in the recovery of a SharePoint installation. If you experience a complete loss of service, you will need to bring up the Central Administration server first and use it to re-establish connections to your SharePoint databases. You can use your Central Administration server Web application console to access the Backup And Restore user interface (UI), or optionally, use the STSADM or Windows PowerShell command-line tools. You can restore this server from a system image or by using the Windows Server Backup utility. After completing your restores, be sure to verify that your SharePoint installation–specific services are running using Central Administration.

1.7. Web Front-End Servers

In an out-of-the-box SharePoint 2010 implementation, Web front-end servers (WFEs) are stateless servers, meaning that they don’t track client access, and any WFE can serve your SharePoint data. This eases restoration of a WFE by allowing you to install the application binaries and then connect to an existing SQL Server configuration database. The SQL Server configuration database populates any required information on the WFE to serve SharePoint content. The exception to this is when you are customizing Web application content. As an example, many WFEs will have branded images, custom pages, excluded managed paths, Web Parts, and specialized authentication mechanisms. All of these must be reinstalled after a WFE system rebuild, which reinforces the need to carefully document customized environments.

1.8. Search Server

If your indexes are not large, rebuilding the index after a system image restore is an efficient way to return current search and query functionality. Alternatively, you can reinstall SharePoint to an existing farm and enable it as a Search server in Central Administration. Conversely, if your index sizes are measured in gigabytes or terabytes, you will want to back up your indexes so they can be restored, providing a reasonably timed return to service. If you don’t back up large content indexes, your search results can be incomplete for hours or even days, depending on the size of your content sources and the speed of your hardware.

Note:

MORE INFO A good source for more information about Search servers and indexing is the Microsoft Office SharePoint Server 2007 Best Practices (Microsoft Press, 2008).

1.9. Service Applications

You can use any of the SharePoint disaster backup tools to back up your service applications, or you can perform a full farm backup that includes all service applications. Don’t forget that the flexibility of SharePoint 2010 allows for an easy reinstallation of your service applications should one of them fail. Also, if your organization relies heavily on a particular service application, you may benefit from having multiple instances of that service application hosted on your farm.

2. Documentation

Documentation ensures that you have identified and defined the remedies necessary to recover all components of your SharePoint farm. There are two categories of documentation: the SharePoint-dependent items and SharePoint component documentation. The SharePoint-dependent items you need to document include all dependent software, hardware, and network components supporting your SharePoint installation.

You also should document all SharePoint-specific components, including Central Administration settings, search and index settings, WFE, and service application settings. By documenting the SharePoint components and their dependencies, you will be able to recover your entire SharePoint farm or a subset of the farm. Organizations that document and prepare for disaster can swiftly react and stay operational after any type of catastrophe.

You should have detailed installation documentation that defines every setting and keystroke required to completely rebuild each server. Document every nuance of your servers, including items like WFE SharePoint Root customizations, and you won’t have to worry about missed configuration options and forgotten software when rebuilding servers. Create a separate document for each server and include all relevant hardware information—the server name, BIOS and backplane versions, network interface cards, RAID controllers, and so on. Documenting your hardware configuration makes it easier to troubleshoot, download correct drivers, and effectively communicate with technical support in the event of failure.

You should also document all service packs, hotfixes, antivirus programs, and other software additions. When you have servers in a load-balanced cluster, it is very important that all machines have an identical configuration. If months have passed since a server build and you haven’t documented additions, you will almost certainly forget a Web Part or similar piece of software when you have to restore the server. This sort of omission can create an inconsistent, negative user experience that can be very difficult to troubleshoot.

Note:

BEST PRACTICES Have your disaster recovery documents backed up to a source that is readily accessible and easily restored in the event of a disaster.

If you have your documentation stored only on your SharePoint site and SharePoint fails, you will not be able to use this documentation. Store hard copies of all of your disaster recovery documents onsite and offsite. In addition, versioning your server documentation can be an invaluable aid for rolling back changes when patches or third-party software affect usability and performance.

After you have thoroughly documented your farm installation, continually update your server documents. This creates a “living” document set that is always current, and it will be worth all the time it took to keep it current when you need the documents for restoring services. Create an appendix in your server documentation with version history and note the reason for changing your specific installation. If possible, verify any changes you make with your peers.

Note:

ON THE COMPANION MEDIA Use the Disaster Recovery Template on the companion media as a guide to completing your organization’s disaster recovery plan.

2.1. SharePoint-Dependent Documentation

This category should contain all of the information that you discovered during your meetings, lunches, and water-cooler conversations with network and SQL Server administrators and is specific to those components that are outside of SharePoint but are required for SharePoint to function. Have the administrators of your network and SQL Server create the documentation for items in this category to make sure it contains everything necessary to recover from a disaster.

2.1.1. Operating System

Because there are several versions of operating systems in widespread use, you must document your specific installation, and keep the installation media easily available as well. Update your documentation whenever you apply service packs, patches, hotfixes, and any other changes or additions to the operating system to ensure that it is consistent on all servers in the farm.

2.1.2. SQL Server

Document the version of SQL Server you are using, along with the service packs, patches, hotfixes, and so on that have been applied to your SharePoint SQL Server instance or instances. Also, if you are performing SQL backups of your SharePoint databases, document the backup strategies and methods you are using, the backup schedule, and the location of backup copies, as well as any other information that will help you quickly recover your SharePoint databases.

2.1.3. Internet Information Services

Document any modifications made to your Web applications through IIS Manager. Also document your backup schedule and the location of the IIS backups.

After talking to the administrators of these systems and becoming familiar with how they are integrated with SharePoint, you should identify any scheduled outages, such as maintenance windows, that you need to take into account during the planning stage for disaster recovery. Your disaster recovery plan will only be as good as the weakest link, so don’t forget to involve the stakeholders early and convince your peers that a good disaster recovery plan is a solid investment.

2.2. SharePoint Component Documentation

This category of documentation contains the information specific to SharePoint, and it focuses on the different components within SharePoint. Your source for SharePoint component information should be your SharePoint farm administrators, who are the best people to write and maintain this critical documentation.

2.2.1. Central Administration

It is important to completely document the installation of all servers, but especially your Central Administration Web application server. This document should be secured and only be accessible to farm administrators. It should contain the following information.

Farm account name and password
Farm passphrase specified during initial creation of farm
Port number of Central Administration
SQL Server server name
SQL instance name on SQL Server
SQL Server account name and password
Configuration database name
Location of binaries (if not the default)

2.2.2. Web Front-End

The following is a list of items that must be documented to successfully back up and restore a customized WFE.

IIS Configuration
Customized authentication software
TCP ports on Web applications and extended Web applications
IIS excluded managed paths and associated content
Centrally located repository for IIS configuration backups
SSL certificate backups
IIS Logs at %SystemRoot%\system32\LogFiles\w3svc<IIS Virt Server ID>
Web Parts installed into the Global Assembly Cache (GAC)
Customized code located in the SharePoint Root

2.2.3. Search Service

Document your file index locations, the backup schedule for these indexes, and the location of these backups. Also be sure to include a list of the database names, the backup schedule for the search databases, the backup method, and the location of backup copies.

2.2.4. Service Applications

Document application service configuration information, associated Web application information, and database names, as well as the backup schedule, method, and location of the backups.

3. Preparation

Preparation involves testing the identified remedies that you established in the documentation process so that when a disaster occurs, you will know exactly what steps to take to recover from it—and you know how long it will take to accomplish the recovery.

Having a plan that won’t work is of little use, so it makes sense to execute a simulation of your disaster recovery plan often, making sure to coordinate with your peers and stakeholders. Executing a disaster recovery plan on a production farm is generally a bad idea, but you can test the plan on secondary server farms and on system image restores in a development environment. If your organization has the resources to build a lab with a mock-up of your production environment, you can use it to test your disaster recovery plan. To minimize costs and overhead, a mock-up environment can be simulated using a virtual environment.

When you are testing your disaster recovery plan, try to test the plan with real-world scenarios involving Search server failures, SQL content database corruption, IIS corruption, network card failures, hard drive failures, and any other common issues you might face. This will provide you with valuable knowledge about how to bring back a failed SharePoint farm.

Many disaster recovery plans adequately cover all hardware, software, and system components, but leave out what may be the most important part of the equation—you and your associates. As an example, if the network administrator is on vacation when a disaster occurs, you may be able to quickly restore your SharePoint 2010 server farm, but it will be of little value if the network is still down. Make sure you and the other system administrators have a list of all administrators. This list should include shift schedules, home and cell phone numbers, vacation schedules and contact information, and any other relevant information you may need to round up the personnel you will need to implement your plan for restoration of service. Having this information available also will help make sure that you are meeting the defined service level agreements (SLAs) for your clients.

It is no accident that after major disasters large banks and brokerage firms do not lose data: their disaster recovery plans are well documented and carefully executed when needed. It is nearly impossible to execute a disaster recovery plan successfully if you do not know all of the dependencies in your environment, haven’t accurately documented the steps required to perform disaster recovery at different levels, and haven’t tested the success of your disaster recovery plan. Education, documentation, and preparation: remember that these are the three steps to creating a disaster recovery plan that will allow your organization to recover quickly and efficiently when calamity strikes.

But don’t just file away that great plan you’ve created after you’ve finished testing it. You have to keep it current and viable. Perform tests monthly to remain familiar with exactly what has to be completed to recover any level of your SharePoint farm.

Others

- SharePoint 2010: Data Protection, Recoverability, and Availability - Introducing Disaster Recovery

- Web Parts and Their Functionality in SharePoint 2010 (part 5) - Search Web Parts, Social Collaboration Web Parts

- Web Parts and Their Functionality in SharePoint 2010 (part 4) - Forms Web Parts, Media And Content Web Parts, Outlook Web App Web Parts

- Web Parts and Their Functionality in SharePoint 2010 (part 3) - Content Rollup Web Parts, Filter Web Parts

- Web Parts and Their Functionality in SharePoint 2010 (part 2) - Lists And Libraries Web Parts, Business Data Web Parts

- Web Parts and Their Functionality in SharePoint 2010 (part 1) - Managing Web Parts

- Microsoft Dynamics Sure Step 2010 : Positioning the solutions for specific industries

- Microsoft Dynamics Sure Step 2010 : Supporting the customer's buying cycle

- Microsoft Dynamics Sure Step 2010 : Solution selling to a current customer

- Backing Up and Restoring Exchange Server 2010 : Performing Additional Backup and Recovery Tasks