Every SQL Server database has its own transaction log that keeps a record of all data modifications in a database (for example, insert, update, delete)
in the order in which they occur. This information is stored in one or
more log files associated with the database. The information stored in
these log files cannot be modified or viewed effectively by any user
process.
SQL Server uses a write-ahead log. The buffer manager
guarantees that changes are written to the transaction log before the
changes are written to the database. The buffer manager also ensures
that the log pages are written out in sequence so that transactions can
be recovered properly in the event of a system crash.
The following is an overview of the sequence of events that occurs when a transaction modifies data:
1. | Writes a BEGIN TRAN record to the transaction log in buffer memory.
|
2. | Writes data modification information to transaction log pages in buffer memory.
|
3. | Writes data modifications to the database in buffer memory.
|
4. | Writes a COMMIT TRAN record to the transaction log in buffer memory.
|
5. | Writes transaction log records to the transaction log file(s) on disk.
|
6. | Sends a COMMIT acknowledgment to the client process.
|
The end of a typical transaction is indicated by a COMMIT record in the transaction log. The presence of the COMMIT
record indicates that the transaction must be reflected in the database
or be redone, if necessary. A transaction aborted during processing by
an explicit rollback or a system error has its changes automatically
undone.
Notice that the data records are not written to disk when a COMMIT
occurs. This is done to minimize disk I/O. All log writes are done
synchronously to ensure that the log records are physically written to
disk and in the proper sequence. Because all modifications to the data
can be recovered from the transaction log, it is not critical that data
changes be written to disk right away. Even in the event of a system
crash or power failure, the data can be recovered from the log if it
hasn’t been written to the database.
SQL Server ensures that the log records are written
before the affected data pages by recording the log sequence number
(LSN) for the log record making the change on the modified data page(s).
Modified, or “dirty,” data pages can be written to disk only when the
LSN recorded on the data page is less than the LSN of the last log page
written to the transaction log.
When and how are the data changes written to disk?
Obviously, they must be written out at some time; otherwise, it could
take an exceedingly long time for SQL Server to start up if it had to
redo all the transactions contained in the transaction log. Also, how
does SQL Server know during recovery which transactions to reapply, or
roll forward, and which transactions to undo or roll back? The following
section looks at the mechanisms involved in the recovery process.
1. The Checkpoint Process
During recovery, SQL Server examines the transaction
log for each database and verifies whether the changes reflected in the
log are also reflected in the database. In addition, it examines the log
to determine whether any data changes that were written to the data
were caused by a transaction that didn’t complete before the system
failure.
As discussed earlier, a COMMIT writes the log records for a transaction to the transaction log (see Figure 1).
Dirty data pages are written out either by the Lazy Writer or
checkpoint process. The Lazy Writer process runs periodically to check
whether the number of free buffers has fallen below a certain threshold,
reclaims any unused pages, and writes out any dirty pages that haven’t
been referenced recently.

The checkpoint process also scans the buffer cache
periodically and writes all dirty log pages and dirty data pages to disk
(see Figure 2).
The purpose of the checkpoint is to sync up the data stored on disk
with the changes recorded in the transaction log. Typically, the
checkpoint process finds little work to do because most dirty pages have
been written out previously by the worker threads or Lazy Writer
process.

SQL Server performs the following steps during a checkpoint:
1. | Writes a record to the log file to record the start of the checkpoint.
|
2. | Stores information recorded for the checkpoint in a chain of checkpoint log records.
|
3. | Records
the minimum recovery LSN (MinLSN), which is the first log image that
must be present for a successful database-wide rollback. The MinLSN is
either the LSN of the start of the checkpoint, LSN of the oldest active
transaction, or LSN of the oldest transaction marked for replication
that hasn’t yet been replicated to all subscribers.
|
4. | Writes a list of all outstanding, active transactions to the checkpoint records.
|
5. | Writes all modified log pages to the transaction log on disk.
|
6. | Writes all dirty data pages to disk. (Data pages that have not been modified are not written back to disk to save I/O.)
|
7. | Writes a record to the log file, indicating the end of the checkpoint.
|
8. | Writes
the LSN of the start of the checkpoint log records to the database boot
page. (This is done so that SQL Server can find the last checkpoint in
the log during recovery.)
|
Figure 3
shows a simplified version of the contents of a transaction log after a
checkpoint. (For simplicity, the checkpoint records are reflected as a
single log entry.)
The primary purpose of a checkpoint is to reduce the
amount of work the server needs to do at recovery time to redo or undo
database changes. A checkpoint can occur under the following
circumstances:
When a checkpoint statement is executed explicitly for the current database.
When ALTER DATABASE is used to change a database option. ALTER DATABASE automatically checkpoints the database when database options are changed.
When an instance of SQL Server is shut down gracefully either due to the execution of the SHUTDOWN statement or because the SQL Server service was stopped.
Note
The SHUTDOWN WITH NOWAIT statement does not
perform what is considered a graceful shutdown of SQL Server. This
statement forces a shutdown of SQL Server without waiting for current
transactions to complete and without
executing a checkpoint of each database. This type of shutdown may cause
the subsequent restart of SQL Server to take a longer time to recover
the databases on the server.
When SQL
Server periodically generates automatic checkpoints in each database to
reduce the amount of time the instance would take to recover the
database.
Automatic Checkpoints
The frequency of automatic checkpoints is determined
by the setting of the recovery interval for SQL Server. However, the
determination of when to perform a checkpoint is based on the number of
records in the log, not a specific period of time. The time interval
between the occurrences of automatic checkpoints can be highly variable.
If few modifications are made to the database, the time interval
between automatic checkpoints could be quite long. Conversely, automatic
checkpoints can occur quite frequently if the update activity on a
database is high.
The recovery interval does not state how
often automatic checkpoints should occur. The recovery interval is
actually related to an estimate of the amount of time it would take SQL
Server to recover the database by applying the number of transactions
recorded since the last checkpoint. By default, the recovery interval is
set to 0, which means SQL Server determines the appropriate
recovery interval for each database. It is recommended that you keep
this setting at the default value unless you notice that checkpoints are
occurring too frequently and are impairing performance. You should try
increasing the value in small increments until you find one that works
well. You need to be aware that if you set the recovery interval higher,
fewer checkpoints will occur, and the database will likely take longer
to recover following a system crash.
If the database is using either the full or
bulk-logged recovery model, an automatic checkpoint occurs whenever the
number of log records reaches the number that SQL Server estimates it
can process within the time specified by the recovery interval option.
If the database is using the simple recovery model,
an automatic checkpoint occurs whenever the number of log records
reaches the number that SQL Server estimates it can process during the
time specified by the recovery interval option or the log becomes 70%
full and the database is in log truncate mode. A database is considered
to be in log truncate mode when the database is using the simple
recovery model and one of the following events has occurred since the
last full backup of the database:
A minimally logged operation is performed in the database, such as a minimally logged bulk copy operation or a minimally logged WRITETEXT statement.
An ALTER DATABASE statement is executed that adds or deletes a file in the database.
A BACKUP LOG statement referencing the database is executed with either the NO_LOG or TRUNCATE_ONLY option.
When a
database is configured to use the simple recovery model, the automatic
checkpoint also truncates the unused portion of the transaction log
prior to the oldest active transaction.
Manual Checkpoints
In addition to automatic checkpoints, a checkpoint can be explicitly initiated by members of the sysadmin fixed server role or the db_owner or db_backupoperator fixed database roles. The syntax for the CHECKPOINT command is as follows:
CHECKPOINT [ checkpoint_duration ]
To minimize the performance impact on other
applications, SQL Server 2008 by default adjusts the frequency of the
writes that a checkpoint operation performs. SQL Server uses this
strategy for automatic checkpoints and for any CHECKPOINT statement that does not specify the checkpoint_duration value.
You can use the checkpoint_duration option to request the amount of time, in seconds, for the checkpoint to complete. When checkpoint_duration is specified, SQL Server attempts to perform the checkpoint within the requested duration. The performance impact of using checkpoint_duration
depends on the number of dirty pages, the activity on the system, and
the actual duration specified. For example, if the checkpoint would
normally complete in 120 seconds, specifying a checkpoint_duration
of 60 seconds causes SQL Server to devote more resources to the
checkpoint than would be assigned by default to be able to complete the
checkpoint in half the time. In contrast, specifying a checkpoint_duration of 240 seconds causes SQL Server to assign fewer resources than would be assigned by default. In other words, a short checkpoint_duration increases the resources devoted to the checkpoint, and a longer checkpoint_duration reduces the resources devoted to the checkpoint.
Regardless of the checkpoint duration
specified, SQL Server always attempts to complete a checkpoint when
possible. In some cases, a checkpoint may complete sooner than the
specified duration, and at times it may run longer than the specified
duration.