What’s New with Database Snapshots
With SQL Server 2005, everything about database
snapshots was new because this was a completely new feature for SQL
Server. With SQL Server 2008, there is little new to this feature other
than under-the-cover improvements to the copy-on-write mechanisms and
three more years of production implementations under their belt. One
hundred percent of the SQL code you have set up for creating and
managing snapshots will work perfectly with SQL Server 2008. No upgrade
pain here.
Database snapshots have solved many companies’
reporting, data safeguarding, and performance issues and directly
contributed to higher availability across the board. Be aware, though,
there are plenty of restrictions with doing database snapshots. In fact,
these restrictions may prohibit you from using snapshots at all. We
talk about these restrictions and when you can safely do database
snapshots in a bit.
Fortunately, the 2005 version of the AdventureWorks
database can be installed using the same installer that installs the
AdventureWorks2008 or AdventureWorks2008R2 database. If you didn’t
install AdventureWorks when you installed either of these sample
databases, simply relaunch the installer and choose to install the
AdventureWorks OLTP database.
What Are Database Snapshots?
Microsoft has kept up its commitment of providing a
database engine foundation that can be highly available 7 days a week,
365 days a year. Database snapshots contribute to this goal in several
ways:
- They decrease recovery time of a database because you can restore a troubled database with a database snapshot—referred to as reverting.
- They
create a security blanket (safeguard) prior to running mass updates on a
critical database. If something goes wrong with the update, the
database can be reverted in a very short amount of time.
- They
provide a read-only, point-in-time reporting database for ad hoc or
canned reporting needs quickly (thus, increasing reporting environment
availability).
- They create a read-only, point-in-time
reporting and off-loaded database for ad hoc or canned reporting needs
quickly from a database mirror (again, increasing reporting environment
availability and also offloading reporting impact away from your
production server/principal database server).
- As a
bonus, database snapshots can be used to create testing or QA
synchronization points to enhance and improve all aspects of critical
testing (thus decreasing bad code from going into production that
directly affects the stability and availability of that production
implementation).
A database snapshot is simply a point-in-time full
database view. It’s not a copy—at least not a full copy when it is
originally created. We talk about this more in a moment. Figure 1 shows conceptually how a database snapshot can be created from a source database on a single SQL Server instance.
This point-in-time view of a database’s data never
changes, even though the data (data pages) in the primary database (the
source of the database snapshot) may change. It is truly a snapshot at a
point in time. For a snapshot, it always simply points to data pages
that were present at the time the snapshot was created. If a data page
is updated in the source database, a copy of the original source data
page is moved to a new page chain termed the sparse file. This utilizes copy-on-write technology. Figure 2 shows the sparse file that is created, alongside the source database itself.
A database snapshot really uses the primary
database’s data pages up until the point that one of these data pages is
updated (changed in any way). As already mentioned, if a data page is
updated in the source database, the original copy of the data page
(which is referenced by the database snapshot) is written to the sparse
file page chain as part of an update operation, using the copy-on-write
technology. It is this new data page in the sparse file that still
provides the correct point-in-time data to the database snapshot that it
serves. Figure 3
illustrates that as more data changes (updates) occur in the source
database, the sparse file gets larger and larger with the old original
data pages.
Eventually a sparse file could contain the entire
original database if all data pages in the primary database were
changed. As you can also see in Figure 32.3,
which data pages the database snapshot uses from the original (source)
database and which data pages are used from the sparse file are all
managed by references in the system catalog for the database snapshot.
This setup is incredibly efficient and represents a major breakthrough
of providing data to others. Because SQL Server is using the
copy-on-write technology, a certain amount of overhead is used during
write operations. This is one of the critical factors you must sort
through if you plan on using database snapshots. Nothing is free. The
overhead includes the copying of the original data page, the writing of
this copied data page to the sparse file, and then the subsequent
metadata updating to the system catalog that manages the database
snapshot data page list. Because of this sharing of data pages, it
should also be clear why database snapshots must be within the same
instance of a SQL Server: both the source database and snapshot start
out as the same data pages and then diverge as source data pages are
updated. In addition, when a database snapshot is created, SQL Server
rolls back any uncommitted transactions for that database snapshot; only
the committed transactions are part of a newly created database
snapshot. And, as you might expect of something that shares data pages,
database snapshots become unavailable if the source database becomes
unavailable (for example, if it is damaged or goes offline).
Note
You
might plan to do a new snapshot after about 30% of the source database
has changed to keep overhead and file sizes in the sparse file at a
minimum. The most frequent problem that occurs with database snapshots
is related to sparse file sizes and available space. Remember, the
sparse file has the potential of being as big as the source database
itself (if all data pages in the source database eventually get
updated). Plan ahead for this situation!
There are, of course, alternatives to database
snapshots, such as data replication, log shipping, and even materialized
views, but none are as easy to manage and use as database snapshots.
The most common terms associated with database snapshots are
Source database—
This is the database on which the database snapshot is based. A
database is a collection of data pages. It is the fundamental data
storage mechanism that SQL Server uses.
Snapshot databases—
There can be one or more database snapshots defined against any one
source database. All snapshots must reside in the same SQL Server
instance.
Database snapshot sparse file—
This new data page allocation contains the original source database
data pages when updates occur to the source database data pages. One
sparse file is associated with each database data file. If you have a
source database allocated with one or more separate data files, you have
corresponding sparse files of each of them.
Reverting to a database snapshot—
If you restore a source database based on a particular database
snapshot that was done at a point in time, you are reverting. You are
actually doing a database RESTORE operation with a FROM DATABASE_SNAPSHOT statement.
Copy-on-write technology—
As part of an update transaction in the source database, a copy of the
source database data page is written to a sparse file so that the
database snapshot can be served correctly (that is, still see the data
page as of the snapshot point in time).
As Figure 4
illustrates, any data query using the database snapshot looks at both
the source database data pages and the sparse file data pages at the
same time. And these data pages always reflect the unchanged data pages
at the point in time the snapshot was created.