SQL Server 2008 R2 : Transaction Logging and the Recovery Process (part 1) - The Checkpoint Process

6/8/2013 7:46:53 PM

Every SQL Server database has its own transaction log that keeps a record of all data modifications in a database (for example, insert, update, delete) in the order in which they occur. This information is stored in one or more log files associated with the database. The information stored in these log files cannot be modified or viewed effectively by any user process.

SQL Server uses a write-ahead log. The buffer manager guarantees that changes are written to the transaction log before the changes are written to the database. The buffer manager also ensures that the log pages are written out in sequence so that transactions can be recovered properly in the event of a system crash.

The following is an overview of the sequence of events that occurs when a transaction modifies data:

1.	Writes a BEGIN TRAN record to the transaction log in buffer memory.
2.	Writes data modification information to transaction log pages in buffer memory.
3.	Writes data modifications to the database in buffer memory.
4.	Writes a COMMIT TRAN record to the transaction log in buffer memory.
5.	Writes transaction log records to the transaction log file(s) on disk.
6.	Sends a COMMIT acknowledgment to the client process.

The end of a typical transaction is indicated by a COMMIT record in the transaction log. The presence of the COMMIT record indicates that the transaction must be reflected in the database or be redone, if necessary. A transaction aborted during processing by an explicit rollback or a system error has its changes automatically undone.

Notice that the data records are not written to disk when a COMMIT occurs. This is done to minimize disk I/O. All log writes are done synchronously to ensure that the log records are physically written to disk and in the proper sequence. Because all modifications to the data can be recovered from the transaction log, it is not critical that data changes be written to disk right away. Even in the event of a system crash or power failure, the data can be recovered from the log if it hasn’t been written to the database.

SQL Server ensures that the log records are written before the affected data pages by recording the log sequence number (LSN) for the log record making the change on the modified data page(s). Modified, or “dirty,” data pages can be written to disk only when the LSN recorded on the data page is less than the LSN of the last log page written to the transaction log.

When and how are the data changes written to disk? Obviously, they must be written out at some time; otherwise, it could take an exceedingly long time for SQL Server to start up if it had to redo all the transactions contained in the transaction log. Also, how does SQL Server know during recovery which transactions to reapply, or roll forward, and which transactions to undo or roll back? The following section looks at the mechanisms involved in the recovery process.

1. The Checkpoint Process

During recovery, SQL Server examines the transaction log for each database and verifies whether the changes reflected in the log are also reflected in the database. In addition, it examines the log to determine whether any data changes that were written to the data were caused by a transaction that didn’t complete before the system failure.

As discussed earlier, a COMMIT writes the log records for a transaction to the transaction log (see Figure 1). Dirty data pages are written out either by the Lazy Writer or checkpoint process. The Lazy Writer process runs periodically to check whether the number of free buffers has fallen below a certain threshold, reclaims any unused pages, and writes out any dirty pages that haven’t been referenced recently.

Figure 1. A commit writes all “dirty” log pages from cache to disk.

The checkpoint process also scans the buffer cache periodically and writes all dirty log pages and dirty data pages to disk (see Figure 2). The purpose of the checkpoint is to sync up the data stored on disk with the changes recorded in the transaction log. Typically, the checkpoint process finds little work to do because most dirty pages have been written out previously by the worker threads or Lazy Writer process.

Figure 2. A checkpoint writes log pages from cache to disk and then writes all “dirty” data pages.

SQL Server performs the following steps during a checkpoint:

1.	Writes a record to the log file to record the start of the checkpoint.
2.	Stores information recorded for the checkpoint in a chain of checkpoint log records.
3.	Records the minimum recovery LSN (MinLSN), which is the first log image that must be present for a successful database-wide rollback. The MinLSN is either the LSN of the start of the checkpoint, LSN of the oldest active transaction, or LSN of the oldest transaction marked for replication that hasn’t yet been replicated to all subscribers.
4.	Writes a list of all outstanding, active transactions to the checkpoint records.
5.	Writes all modified log pages to the transaction log on disk.
6.	Writes all dirty data pages to disk. (Data pages that have not been modified are not written back to disk to save I/O.)
7.	Writes a record to the log file, indicating the end of the checkpoint.
8.	Writes the LSN of the start of the checkpoint log records to the database boot page. (This is done so that SQL Server can find the last checkpoint in the log during recovery.)

Figure 3 shows a simplified version of the contents of a transaction log after a checkpoint. (For simplicity, the checkpoint records are reflected as a single log entry.)

Figure 3. A simplified view of the end of the transaction log with various completed and active transactions, as well as the last checkpoint.

The primary purpose of a checkpoint is to reduce the amount of work the server needs to do at recovery time to redo or undo database changes. A checkpoint can occur under the following circumstances:

When a checkpoint statement is executed explicitly for the current database.
When ALTER DATABASE is used to change a database option. ALTER DATABASE automatically checkpoints the database when database options are changed.
When an instance of SQL Server is shut down gracefully either due to the execution of the SHUTDOWN statement or because the SQL Server service was stopped.
Note

The SHUTDOWN WITH NOWAIT statement does not perform what is considered a graceful shutdown of SQL Server. This statement forces a shutdown of SQL Server without waiting for current transactions to complete and without executing a checkpoint of each database. This type of shutdown may cause the subsequent restart of SQL Server to take a longer time to recover the databases on the server.
When SQL Server periodically generates automatic checkpoints in each database to reduce the amount of time the instance would take to recover the database.

Automatic Checkpoints

The frequency of automatic checkpoints is determined by the setting of the recovery interval for SQL Server. However, the determination of when to perform a checkpoint is based on the number of records in the log, not a specific period of time. The time interval between the occurrences of automatic checkpoints can be highly variable. If few modifications are made to the database, the time interval between automatic checkpoints could be quite long. Conversely, automatic checkpoints can occur quite frequently if the update activity on a database is high.

The recovery interval does not state how often automatic checkpoints should occur. The recovery interval is actually related to an estimate of the amount of time it would take SQL Server to recover the database by applying the number of transactions recorded since the last checkpoint. By default, the recovery interval is set to 0, which means SQL Server determines the appropriate recovery interval for each database. It is recommended that you keep this setting at the default value unless you notice that checkpoints are occurring too frequently and are impairing performance. You should try increasing the value in small increments until you find one that works well. You need to be aware that if you set the recovery interval higher, fewer checkpoints will occur, and the database will likely take longer to recover following a system crash.

If the database is using either the full or bulk-logged recovery model, an automatic checkpoint occurs whenever the number of log records reaches the number that SQL Server estimates it can process within the time specified by the recovery interval option.

If the database is using the simple recovery model, an automatic checkpoint occurs whenever the number of log records reaches the number that SQL Server estimates it can process during the time specified by the recovery interval option or the log becomes 70% full and the database is in log truncate mode. A database is considered to be in log truncate mode when the database is using the simple recovery model and one of the following events has occurred since the last full backup of the database:

A minimally logged operation is performed in the database, such as a minimally logged bulk copy operation or a minimally logged WRITETEXT statement.
An ALTER DATABASE statement is executed that adds or deletes a file in the database.
A BACKUP LOG statement referencing the database is executed with either the NO_LOG or TRUNCATE_ONLY option.

When a database is configured to use the simple recovery model, the automatic checkpoint also truncates the unused portion of the transaction log prior to the oldest active transaction.

Manual Checkpoints

In addition to automatic checkpoints, a checkpoint can be explicitly initiated by members of the sysadmin fixed server role or the db_owner or db_backupoperator fixed database roles. The syntax for the CHECKPOINT command is as follows:

CHECKPOINT [ checkpoint_duration ]

To minimize the performance impact on other applications, SQL Server 2008 by default adjusts the frequency of the writes that a checkpoint operation performs. SQL Server uses this strategy for automatic checkpoints and for any CHECKPOINT statement that does not specify the checkpoint_duration value.

You can use the checkpoint_duration option to request the amount of time, in seconds, for the checkpoint to complete. When checkpoint_duration is specified, SQL Server attempts to perform the checkpoint within the requested duration. The performance impact of using checkpoint_duration depends on the number of dirty pages, the activity on the system, and the actual duration specified. For example, if the checkpoint would normally complete in 120 seconds, specifying a checkpoint_duration of 60 seconds causes SQL Server to devote more resources to the checkpoint than would be assigned by default to be able to complete the checkpoint in half the time. In contrast, specifying a checkpoint_duration of 240 seconds causes SQL Server to assign fewer resources than would be assigned by default. In other words, a short checkpoint_duration increases the resources devoted to the checkpoint, and a longer checkpoint_duration reduces the resources devoted to the checkpoint.

Regardless of the checkpoint duration specified, SQL Server always attempts to complete a checkpoint when possible. In some cases, a checkpoint may complete sooner than the specified duration, and at times it may run longer than the specified duration.

Others