SQL Server 2008 R2 : Database Snapshots - What Are Database Snapshots?

6/22/2013 7:34:46 PM

What’s New with Database Snapshots

With SQL Server 2005, everything about database snapshots was new because this was a completely new feature for SQL Server. With SQL Server 2008, there is little new to this feature other than under-the-cover improvements to the copy-on-write mechanisms and three more years of production implementations under their belt. One hundred percent of the SQL code you have set up for creating and managing snapshots will work perfectly with SQL Server 2008. No upgrade pain here.

Database snapshots have solved many companies’ reporting, data safeguarding, and performance issues and directly contributed to higher availability across the board. Be aware, though, there are plenty of restrictions with doing database snapshots. In fact, these restrictions may prohibit you from using snapshots at all. We talk about these restrictions and when you can safely do database snapshots in a bit.

Fortunately, the 2005 version of the AdventureWorks database can be installed using the same installer that installs the AdventureWorks2008 or AdventureWorks2008R2 database. If you didn’t install AdventureWorks when you installed either of these sample databases, simply relaunch the installer and choose to install the AdventureWorks OLTP database.

What Are Database Snapshots?

Microsoft has kept up its commitment of providing a database engine foundation that can be highly available 7 days a week, 365 days a year. Database snapshots contribute to this goal in several ways:

They decrease recovery time of a database because you can restore a troubled database with a database snapshot—referred to as reverting.
They create a security blanket (safeguard) prior to running mass updates on a critical database. If something goes wrong with the update, the database can be reverted in a very short amount of time.
They provide a read-only, point-in-time reporting database for ad hoc or canned reporting needs quickly (thus, increasing reporting environment availability).
They create a read-only, point-in-time reporting and off-loaded database for ad hoc or canned reporting needs quickly from a database mirror (again, increasing reporting environment availability and also offloading reporting impact away from your production server/principal database server).
As a bonus, database snapshots can be used to create testing or QA synchronization points to enhance and improve all aspects of critical testing (thus decreasing bad code from going into production that directly affects the stability and availability of that production implementation).

A database snapshot is simply a point-in-time full database view. It’s not a copy—at least not a full copy when it is originally created. We talk about this more in a moment. Figure 1 shows conceptually how a database snapshot can be created from a source database on a single SQL Server instance.

Figure 1. Basic database snapshot concept: a source database and its database snapshot, all on a single SQL Server instance.

This point-in-time view of a database’s data never changes, even though the data (data pages) in the primary database (the source of the database snapshot) may change. It is truly a snapshot at a point in time. For a snapshot, it always simply points to data pages that were present at the time the snapshot was created. If a data page is updated in the source database, a copy of the original source data page is moved to a new page chain termed the sparse file. This utilizes copy-on-write technology. Figure 2 shows the sparse file that is created, alongside the source database itself.

Figure 2. Source database data pages and the sparse file data pages that comprise the database snapshot.

A database snapshot really uses the primary database’s data pages up until the point that one of these data pages is updated (changed in any way). As already mentioned, if a data page is updated in the source database, the original copy of the data page (which is referenced by the database snapshot) is written to the sparse file page chain as part of an update operation, using the copy-on-write technology. It is this new data page in the sparse file that still provides the correct point-in-time data to the database snapshot that it serves. Figure 3 illustrates that as more data changes (updates) occur in the source database, the sparse file gets larger and larger with the old original data pages.

Figure 3. Data pages being copied to the sparse file for a database snapshot as pages are being updated in the source database.

Eventually a sparse file could contain the entire original database if all data pages in the primary database were changed. As you can also see in Figure 32.3, which data pages the database snapshot uses from the original (source) database and which data pages are used from the sparse file are all managed by references in the system catalog for the database snapshot. This setup is incredibly efficient and represents a major breakthrough of providing data to others. Because SQL Server is using the copy-on-write technology, a certain amount of overhead is used during write operations. This is one of the critical factors you must sort through if you plan on using database snapshots. Nothing is free. The overhead includes the copying of the original data page, the writing of this copied data page to the sparse file, and then the subsequent metadata updating to the system catalog that manages the database snapshot data page list. Because of this sharing of data pages, it should also be clear why database snapshots must be within the same instance of a SQL Server: both the source database and snapshot start out as the same data pages and then diverge as source data pages are updated. In addition, when a database snapshot is created, SQL Server rolls back any uncommitted transactions for that database snapshot; only the committed transactions are part of a newly created database snapshot. And, as you might expect of something that shares data pages, database snapshots become unavailable if the source database becomes unavailable (for example, if it is damaged or goes offline).

Note

You might plan to do a new snapshot after about 30% of the source database has changed to keep overhead and file sizes in the sparse file at a minimum. The most frequent problem that occurs with database snapshots is related to sparse file sizes and available space. Remember, the sparse file has the potential of being as big as the source database itself (if all data pages in the source database eventually get updated). Plan ahead for this situation!

There are, of course, alternatives to database snapshots, such as data replication, log shipping, and even materialized views, but none are as easy to manage and use as database snapshots.

The most common terms associated with database snapshots are

Source database— This is the database on which the database snapshot is based. A database is a collection of data pages. It is the fundamental data storage mechanism that SQL Server uses.
Snapshot databases— There can be one or more database snapshots defined against any one source database. All snapshots must reside in the same SQL Server instance.
Database snapshot sparse file— This new data page allocation contains the original source database data pages when updates occur to the source database data pages. One sparse file is associated with each database data file. If you have a source database allocated with one or more separate data files, you have corresponding sparse files of each of them.
Reverting to a database snapshot— If you restore a source database based on a particular database snapshot that was done at a point in time, you are reverting. You are actually doing a database RESTORE operation with a FROM DATABASE_SNAPSHOT statement.
Copy-on-write technology— As part of an update transaction in the source database, a copy of the source database data page is written to a sparse file so that the database snapshot can be served correctly (that is, still see the data page as of the snapshot point in time).

As Figure 4 illustrates, any data query using the database snapshot looks at both the source database data pages and the sparse file data pages at the same time. And these data pages always reflect the unchanged data pages at the point in time the snapshot was created.

Figure 4. A query using the database snapshot touches both source database data pages and sparse file data pages to satisfy a query.

Others