What’s New in Data Integrity
Much of the functionality
related to data integrity has remained the same in SQL Server 2008.
Several features that were added in SQL Server 2005, such as cascading
integrity constraints, are still supported in SQL Server 2008. The lack
of change in this area is generally a blessing. The tools available to
enforce data integrity were comprehensive in 2005 and remain so in 2008.
Keep
in mind that bound defaults, which were deprecated in SQL Server 2005,
are still available in SQL Server 2008. For now, you can still use this
statement to create a default that is bound to one or more columns.
Microsoft recommends using the DEFAULT keyword with ALTER TABLE or CREATE TABLE instead.
Types of Data Integrity
How
integrity is enforced depends on the type of integrity being enforced.
As described in the following sections, the types of data integrity are
domain, entity, and referential integrity.
Domain Integrity
Domain integrity
controls the validation of values for a column. You can use domain
integrity to enforce the type, format, and possible values of data
stored in a column. SQL Server provides several mechanisms to enforce
domain integrity:
You can control the type of data stored in a column by assigning a data type to the column.
You can use CHECK constraints and rules to control the format of the data.
You can control the range of values stored in a column by using FOREIGN KEY constraints, CHECK constraints, default definitions, nullability, and rules.
Entity Integrity
Entity integrity requires that all rows in a table be unique. You can enforce entity integrity in SQL Server by using PRIMARY KEY constraints, UNIQUE constraints, and IDENTITY properties.
Referential Integrity
Referential integrity
preserves the defined relationships between tables. You can define such
a relationship in SQL Server by relating foreign key columns on one
table to the primary key or unique key of another table. When it is
defined, referential integrity ensures that values inserted in the
foreign key columns have corresponding values in the primary table. It
also controls changes to the primary key table and ensures that related
foreign key rows are not left orphaned.
Enforcing Data Integrity
You can enforce data integrity
by using declarative or procedural methods. Implementing declarative
data integrity requires little or no coding. Implementing procedural
data integrity is more flexible but requires more custom coding.
Implementing Declarative Data Integrity
Declarative integrity
is enforced within the database, using constraints, rules, and defaults.
This is the preferred method of enforcing integrity because it has low
overhead and requires little or no custom programming. It can be
centrally managed in the database, and it provides a consistent approach
for ensuring the integrity of data.
Implementing Procedural Data Integrity
Procedural
integrity can be implemented with stored procedures, triggers, and
application code. It requires custom programming that defines and
enforces the integrity of the data. The biggest benefits of implementing
procedural data integrity are flexibility and control. You can
implement the custom code in many different ways to enforce the
integrity of your data. The custom code can also be a detriment; the
lack of consistency and potential inefficiencies in the way the data
integrity is performed can be a real problem.
In
general, declarative data integrity should be used as the primary means
for control. Procedural data integrity can be used to augment
declarative data integrity, if needed.
Rules
You can use rules as another method to enforce domain integrity. Rules are similar to CHECK
constraints but have some limitations. The biggest advantage when using
a rule is that one rule can be bound to multiple columns or
user-defined data types. This capability can be useful for columns that
contain the same type of data and are found in multiple tables in a
database. The syntax for creating a rule is as follows:
CREATE RULE [ schema_name . ] rule_name
AS condition_expression
[ ; ]
condition_expression can include any statement that can be placed in a WHERE clause. It includes one variable that is preceded with the @ symbol. This variable contains the value of the bound column that is supplied with the INSERT or UPDATE
statement. The name of the variable is not important, but the
conditions and formatting within the expression are. Only one variable
can be referenced per rule. The following example illustrates the
creation of a rule that could be used to enforce the format of data
inserted in phone number columns:
CREATE RULE phone_rule AS
@phone LIKE '([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
The variable in the condition expression is @phone, and it contains the inserted or updated value for any column that the rule is bound to. The following example binds the phone_rule rule to the PhoneNumber column in the person.PersonPhone table:
sp_bindrule phone_rule, 'Person.PersonPhone.PhoneNumber'
When a rule is bound to a
column, any future insertions or updates to data in the bound column are
constrained by the rule. Existing data is not affected at the time the
rule is bound to the column. For example, many different phone number
formats in the person.PersonPhone table do not conform to phone_rule, but phone_rule can be bound to this table successfully. To illustrate this point, the following UPDATE statement can be run against the person.PersonPhone table after the phone_rule rule is bound to the PhoneNumber column:
update person.contact
set phone = phone
The preceding update sets the PhoneNumber value to itself, but this causes phone_rule to execute. The following error message is displayed after the update is run because the existing data in the person.contact table violates the phone_rule rule:
Msg 513, Level 16, State 0, Line 2
A column insert or update conflicts with a rule imposed
by a previous CREATE RULE statement.
The statement was terminated.
The conflict occurred in database 'Adventureworks2008',
table 'PersonPhone', column 'PhoneNumber'.
The statement has been terminated.
Although
rules are powerful objects, Microsoft has slated them for removal in a
future version of SQL Server. Microsoft recommends using CHECK constraints on each column instead of rules. CHECK constraints provide more flexibility and a consistent approach, and multiple CHECK constraints can be applied to a single column.