SQL Server 2005 : High-Availability Solutions

10/13/2011 3:49:05 PM

Database server failures can be costly in production environments. A high-availability solution is a solution that minimizes failures and downtime. In a good solution, the effects of hardware or software failures are minimal, if they exist at all. The idea is to maintain the availability of the server and its objects, allowing applications to interact without fail. Downtime needs to be minimized, eliminated, or scheduled for times when applications do not need access to the data.

You can do many things to minimize or eliminate downtime and to prepare a disaster recovery plan in the event that something does occur. You start with backups and scheduled maintenance.

In some cases, you can have a single, standalone server. Even with backup, if something goes wrong, you have to restore from a backup to get back up and running again. A solution in which every minute of downtime costs hundreds of dollars is not a good solution.

In other cases, you can implement one of the secondary server solutions, often called warm backups. Through log shipping or replication, you can achieve a better percentage of uptime, at a reasonable cost. In these scenarios, a secondary server is constantly being refreshed with the latest information. If something goes wrong with the primary server, it is simply a matter of switching processing over to the secondary server, which requires much less loss of processing and data access time than does restoring from backup.

You can also use more costly solutions, such as using redundant hardware and simultaneous update solutions (for example, database mirroring, failover clustering). With several solutions, you update data simultaneously within multiple database engines. If one system fails, the other can take over without missing a beat.

Generally, at the lowest end of the spectrum, you have low cost but high risk and essentially no high-availability solution in place. As you move up to the high-availability solutions, you have a low cost with some risk (loss due to failure) and a degree of latency (the amount of time before data agrees in two locations). At the higher end, you achieve lower risk and minimal latency, but at higher cost.

Implementing Log Shipping

Log shipping is a reasonably low-cost solution for increasing availability. Log shipping operates at the database level, and it provides for redundant storage of data in a multiple-server environment. With log shipping, one or more standby databases operate as warm backups. These secondary databases are referred to as warm copies of the primary database. The primary database is the only copy that can be updated, and the secondary database can be set up as a read-only database to offload query-related processing from the primary server. The secondary database can also be set up solely as a standby, offline copy of the primary database.

You initiate a secondary database by restoring a full backup of the primary database without recovery. You can do this ahead of time or as part of the log shipping setup. After you restore the full backup with no recovery, the secondary database reflects the standby status in Management Studio. In Figure 1, the ONE database on the default instance of YUKONTWO has been fully backed up, and that backup has been restored onto the StandbyONE database on the YUKONTWO\SECOND instance of the server.

Figure 1. Standby status is indicated by (Restoring...) in Management Studio.

A log shipping configuration includes two or more instances of SQL Server. One instance holds the primary database, and one or more instances hold copies of the database as secondary databases. You also can optionally configure a monitor server. You control the process by creating and executing jobs at the appropriate servers.

The primary server performs log backups at regular intervals. A single job is created on the server to perform these backups on schedule and stores the backups on a network share that is accessible to both the primary and secondary machines. The default is to perform the operation every 15 minutes, but you can alter this timing if you want more or less frequency. The timing here and the timing of the associated jobs on the secondary server determine the actual latency.

Each secondary server utilizes two jobs to perform regular updates of its secondary database from log backups created on the primary server. The first job performs a copy of the log backup to a destination folder (usually located on the secondary server). The second job controls the restore operations. You can have the restore use no recovery to keep the database offline or use the standby option to cause the server to become read-only, for reporting purposes.

The monitoring instance executes one job on a regular basis to identify and alert an operator if the log shipping is not operating as it is designed to operate. Log shipping has no automated failover to the secondary instance. This brings with it the risk of downtime in the event of failure. There is also an associated risk of data loss for the interval between log backups if there is a failure in the logging hardware. Before failover can occur, the secondary databases must be brought fully up-to-date manually. Secondary databases also have limited availability during restores, so they might be unavailable as reporting sources during restores.

You can perform the complete setup of log shipping, including the configuration of all tasks, by using Management Studio. To do so, you right-click the primary database and select Tasks and then Ship Transaction Logs. Figure 2 shows an example of this configuration. In this example, all three instances are running on the same physical machine, which is an unlikely scenario in a production environment and in fact would defeat the purpose of attempting to provide a high-availability solution.

Figure 2. Log shipping configuration using three SQL Server instances.

You can use log shipping alone or as a supplement to database mirroring. When these solutions are used together, the current principal database of the database mirroring configuration is also the current primary database of the log shipping configuration.

Using Database Mirroring

Microsoft support policies do not apply to the database mirroring feature in SQL Server 2005. Database mirroring is currently disabled by default, but may be enabled for evaluation purposes only by using trace flag 1400 as a startup parameter. (For more information about trace flags, see Trace Flags (Transact-SQL).) Database mirroring should not be used in production environments, and Microsoft support services will not support databases or applications that use database mirroring. Database mirroring documentation is included in SQL Server 2005 for evaluation purposes only, and the Documentation Policy for SQL Server 2005 Support and Upgrade does not apply to the database mirroring documentation.

Enabling Trace Flags in a Test Environment

You can enable trace flag 1400 to get a look into database mirroring; to do so, you have two choices:

You can stop the service and run sqlsrvr.exe from the command prompt by using the following:

sqlservr
        -d"C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\master.mdf"
        -T1400

You can add the -T1400 flag as a startup option in SQL Server Configuration Manager. You must separate each startup option from the previous option with a semicolon, as shown in Figure 3.

Figure 3. Adding trace flags to the SQL Server startup.

SQL Server database mirroring is implemented on a per-database basis to increase availability. You must have the database set to the full recovery model, and you cannot mirror the master, msdb, tempdb, or model databases.

Mirroring maintains two copies of a database. Each copy is held on a separate instance of SQL Server. Optionally, a third instance can act as a mirroring witness to observe activity and to alert and/or fail over if problems occur. In database mirroring, the witness server is an instance of SQL Server that monitors the status of the principal and mirror servers. The witness, by default, can initiate automatic failover if the principal server fails. Typically, each instance resides on a separate machine, but it is possible to configure all three instances on the same computer.

The principal database is the updatable version of the database, and the mirror copy duplicates the activity of the principal. The two instances communicate over database mirroring endpoints that control the mirroring sessions. If a witness is used, a third endpoint is created on that instance for communication purposes.

You begin mirroring similar to the way you begin log shipping: You perform a full backup of the principal database and restore the backup onto the mirror instance with the no recovery option. You then proceed to perform log backups of the principal database, having them restored onto the mirror copy. The applications can be performed synchronously or asynchronously. If you are configuring as synchronous with automatic failover, a witness instance is mandatory. If the configuration is synchronous without failover or asynchronous, the witness is optional.

A synchronous configuration commits changes to both servers simultaneously. The asynchronous configuration first commits the changes on the principal database and then transfers the changes to the mirror copy. If automatic failover is configured, the witness controls the failover to the mirror if the principal server becomes unavailable.

Because these configurations are not currently supported in a production environment, the only true failover configuration available in a SQL Server environment is failover clustering.

Using Failover Clustering

Failover clustering provides high-availability support for SQL Server. In the event of a failure, SQL Server can maintain business operations through the use of multiple instances of the database engine. You can configure the environment such that one node of a cluster can fail over to another node in the cluster during any failure.

A failover cluster is set up as a series of one or more nodes. Each node is a server in its own right. The nodes operate with two or more shared disks. The shared disks maintain the resources that the node set accesses. This disk configuration is known as the resource group or grouping. Collectively, the entire configuration makes up a single virtual server. Applications communicate with a singular machine over a single IP address.

A SQL Server virtual server appears as if it were a single computer. This allows applications to perform their database access without the need for reconfiguration if something happens to one of the nodes in the cluster. A failover cluster is a hardware configuration that requires special computer hardware. Of course, this special hardware comes at a cost, making a failover cluster implementation much more costly than other forms of high availability. If you lack the budget or hardware to implement this strategy, replication may be a better solution.

Others