SQL Server 2008 R2 : Implementing Data Integrity (part 1) - Types of Data Integrity, Enforcing Data Integrity, Rules

12/1/2012 4:47:03 PM

What’s New in Data Integrity

Much of the functionality related to data integrity has remained the same in SQL Server 2008. Several features that were added in SQL Server 2005, such as cascading integrity constraints, are still supported in SQL Server 2008. The lack of change in this area is generally a blessing. The tools available to enforce data integrity were comprehensive in 2005 and remain so in 2008.

Keep in mind that bound defaults, which were deprecated in SQL Server 2005, are still available in SQL Server 2008. For now, you can still use this statement to create a default that is bound to one or more columns. Microsoft recommends using the DEFAULT keyword with ALTER TABLE or CREATE TABLE instead.

Types of Data Integrity

How integrity is enforced depends on the type of integrity being enforced. As described in the following sections, the types of data integrity are domain, entity, and referential integrity.

Domain Integrity

Domain integrity controls the validation of values for a column. You can use domain integrity to enforce the type, format, and possible values of data stored in a column. SQL Server provides several mechanisms to enforce domain integrity:

You can control the type of data stored in a column by assigning a data type to the column.
You can use CHECK constraints and rules to control the format of the data.
You can control the range of values stored in a column by using FOREIGN KEY constraints, CHECK constraints, default definitions, nullability, and rules.

Entity Integrity

Entity integrity requires that all rows in a table be unique. You can enforce entity integrity in SQL Server by using PRIMARY KEY constraints, UNIQUE constraints, and IDENTITY properties.

Referential Integrity

Referential integrity preserves the defined relationships between tables. You can define such a relationship in SQL Server by relating foreign key columns on one table to the primary key or unique key of another table. When it is defined, referential integrity ensures that values inserted in the foreign key columns have corresponding values in the primary table. It also controls changes to the primary key table and ensures that related foreign key rows are not left orphaned.

Enforcing Data Integrity

You can enforce data integrity by using declarative or procedural methods. Implementing declarative data integrity requires little or no coding. Implementing procedural data integrity is more flexible but requires more custom coding.

Implementing Declarative Data Integrity

Declarative integrity is enforced within the database, using constraints, rules, and defaults. This is the preferred method of enforcing integrity because it has low overhead and requires little or no custom programming. It can be centrally managed in the database, and it provides a consistent approach for ensuring the integrity of data.

Implementing Procedural Data Integrity

Procedural integrity can be implemented with stored procedures, triggers, and application code. It requires custom programming that defines and enforces the integrity of the data. The biggest benefits of implementing procedural data integrity are flexibility and control. You can implement the custom code in many different ways to enforce the integrity of your data. The custom code can also be a detriment; the lack of consistency and potential inefficiencies in the way the data integrity is performed can be a real problem.

In general, declarative data integrity should be used as the primary means for control. Procedural data integrity can be used to augment declarative data integrity, if needed.

Rules

You can use rules as another method to enforce domain integrity. Rules are similar to CHECK constraints but have some limitations. The biggest advantage when using a rule is that one rule can be bound to multiple columns or user-defined data types. This capability can be useful for columns that contain the same type of data and are found in multiple tables in a database. The syntax for creating a rule is as follows:

CREATE RULE [ schema_name . ] rule_name
AS condition_expression
[ ; ]

condition_expression can include any statement that can be placed in a WHERE clause. It includes one variable that is preceded with the @ symbol. This variable contains the value of the bound column that is supplied with the INSERT or UPDATE statement. The name of the variable is not important, but the conditions and formatting within the expression are. Only one variable can be referenced per rule. The following example illustrates the creation of a rule that could be used to enforce the format of data inserted in phone number columns:

CREATE RULE phone_rule AS
@phone LIKE '([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'

The variable in the condition expression is @phone, and it contains the inserted or updated value for any column that the rule is bound to. The following example binds the phone_rule rule to the PhoneNumber column in the person.PersonPhone table:

sp_bindrule phone_rule, 'Person.PersonPhone.PhoneNumber'

When a rule is bound to a column, any future insertions or updates to data in the bound column are constrained by the rule. Existing data is not affected at the time the rule is bound to the column. For example, many different phone number formats in the person.PersonPhone table do not conform to phone_rule, but phone_rule can be bound to this table successfully. To illustrate this point, the following UPDATE statement can be run against the person.PersonPhone table after the phone_rule rule is bound to the PhoneNumber column:

update person.contact
 set phone = phone

The preceding update sets the PhoneNumber value to itself, but this causes phone_rule to execute. The following error message is displayed after the update is run because the existing data in the person.contact table violates the phone_rule rule:

Msg 513, Level 16, State 0, Line 2
A column insert or update conflicts with a rule imposed
by a previous CREATE RULE statement.
The statement was terminated.
The conflict occurred in database 'Adventureworks2008',
table 'PersonPhone', column 'PhoneNumber'.
The statement has been terminated.

Although rules are powerful objects, Microsoft has slated them for removal in a future version of SQL Server. Microsoft recommends using CHECK constraints on each column instead of rules. CHECK constraints provide more flexibility and a consistent approach, and multiple CHECK constraints can be applied to a single column.

Others