Before we go too much further, it's important to define corruption
as it relates to a SQL Server database. There are essentially two
different types of corruption: logical and physical. Logical corruption
refers to situations in which people (or applications) remove data they
shouldn't—for example, deleting one half of an Order:OrderDetail
relationship. In contrast, physical corruption is almost always caused
by faulty I/O subsystem components; examples include crashed hard
drives and faulty RAID controllers.
Making
the distinction between logical and physical corruption is important. A
statement from a DBA to the effect of "The database is corrupt!"
usually means something much more sinister than the same statement made
from an application support person.
This
section will focus on tools and options used to validate an I/O
subsystem and detect the presence of physical corruption.
1. SQLIOSIM
While
a reliable and well-configured I/O system should rarely cause any
database corruption, it can, and does, happen. Relying on DBCC to remove corruption is a poor
alternative to a good design and maintenance strategy aimed at
preventing corruption and implementing early detection mechanisms.
To briefly recap,
SQLIOSIM simulates SQL Server I/O workload, without SQL Server needing
to be installed. The primary benefit of this tool is being able to
detect issues in any part of the I/O chain (hardware, software, OS,
drivers, firmware, and so forth) that may lead to database corruption
at a later point.
The
most likely cause of database corruption is an issue somewhere in the
I/O subsystem. If and when corruption appears, it's often difficult to
come up with conclusive proof as to the cause, and as a result, DBAs,
hardware vendors, and administrators often end up in a heated blame
game. SQLIOSIM offers a powerful method for validating each component
of the I/O chain. When used as part of the validation and commissioning
process for new hardware, it offers a tremendously valuable clean bill
of health, offering peace of mind that corruption caused by the I/O
subsystem, while possible, is unlikely.
In
addition to using SQLIOSIM to validate the health of an I/O subsystem,
enabling the page checksums feature offers a nice method of ongoing
page validation outside explicit DBCC checks.
2. Page checksums
Page checksums, enabled by default as a database property as shown in figure 1, ensure ongoing validation of pages written to and read from the file system. When a page is written to
disk, SQL Server calculates a checksum value based on the page contents
and writes the value to the page. When a page containing a checksum
value is read from disk, the checksum is recalculated and compared.
The
process of writing a page checksum is performed by SQL Server as the
last action before the page leaves its control (flushed from the buffer
pool). In a similar manner, the checksum is recalculated and compared
as the first action when read from disk. If when recalculated the
checksum value is different from the checksum stored on the page,
that's a fairly solid indication some part of the I/O system has
corrupted the page.
The
performance overhead of calculating, writing, and comparing checksums
has been estimated at approximately 2 percent, a small price to pay for
ongoing validation of I/O integrity. In addition to validating pages as
they're read from disk, the DBCC CHECK* commands will validate each page's checksum value as part of their operation. Further, the backup and restore process can also validate checksums using the optional WITH CHECKSUM
clause. Assuming regular backups are in place that use this option,
page checksums enable constant and ongoing validation of I/O integrity.
If
a page checksum is detected as being incorrect, messages are written to
the SQL Server log and Windows event logs. Assuming these are being
actively monitored, this presents a mechanism for early detection and
investigation.
Prior to SQL Server 2008, aside from restoring a backup or risking data loss with the DBCC repair options, there was no way of safely
overcoming a corrupted page without the potential for data loss. In the
Enterprise edition of SQL Server 2008, database mirroring partners can
exchange pages to automatically overcome certain types of page
corruption.
|
While
SQLIOSIM and page checksums are excellent tools for initial and ongoing
I/O validation, there's no substitute for regular DBCC checks.