SQL Server 2008 R2 : Implementing Data Integrity (part 3) - Using Constraints - The PRIMARY KEY Constraint, The UNIQUE Constraint, The FOREIGN KEY Referential Integrity Constraint

12/1/2012 4:50:35 PM

Constraints—including PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and DEFAULT—are the primary method used to enforce data integrity.

The PRIMARY KEY Constraint

The PRIMARY KEY constraint is one of the key methods for ensuring entity integrity. When this constraint is defined on a table, it ensures that every row can be uniquely identified with the primary key value(s). The primary key can have one or more columns as part of its definition. None of the columns in the primary key definition can allow nulls. When multiple columns are used in the definition of the primary key, the combination of the values in all the primary key columns must be unique. Duplication can exist in a single column that is part of a multicolumn primary key.

There can be only one primary key defined for each table. When a primary key is defined on a table, a unique index is automatically created as well. This index contains all the columns in the primary key and ensures that the rows in this index are unique. Generally, every table in a database should have a primary key. The primary key and its associated unique index provide fast access to a database table.

Figure 26.1 shows the Adventureworks2008 database Employee table, which is an example of a table that has a primary key defined. The primary key in this table is EmployeeID, and it is denoted in the dialog shown in Figure 1 with a key symbol in the leftmost column.

Figure 1. A primary key example.

The existing primary key on the Employee table in the Adventureworks2008 database was generated as a T-SQL script, as shown in the following example:

ALTER TABLE [HumanResources].[Employee]
  ADD  CONSTRAINT [PK_Employee_BusinessEntityID] PRIMARY KEY CLUSTERED
(BusinessEntityID ASC)

In general, you try to choose a primary key that is relatively short. BusinessEntityID, for example, is a good choice because it is an integer column and takes only 4 bytes of storage. This is particularly important when the primary key is CLUSTERED, as in the case of PK_Employee_BusinessEntityID. The key values from the clustered index are used by all nonclustered indexes as lookup keys. If the clustered key is large, this consumes more space and affects performance.

Surrogate keys are often good choices for primary keys. The BusinessEnityID column in the Person.BusinessEntity table is an example of a surrogate key. Surrogate keys consist of a single column that automatically increments and is inherently unique, as in the case of an identity column. Surrogate keys are good candidates for primary keys because they are implicitly unique and relatively short in length. You should avoid using large, multicolumn indexes as primary keys. They can impede performance because fewer index rows can be stored on each index page.

Note

Over the years, there has been much debate over the use of surrogate keys for primary keys. One school of thought is to avoid surrogate keys because insertions always occur at the end of the primary key index and are not distributed. This can lead to “hot spots” in the index because the insert activity is always on the last page of the index. In addition, surrogate keys have no real meaning and are less intuitive than primary keys that have meaning, such as lastname and firstname.

The other school of thought, in favor of using surrogate keys for primary keys, emphasizes the importance of defining primary keys that are not based on meaningful columns. If meaningful columns are used and the definitions of those columns change, this can have a significant impact on the table that contains the primary key and any tables related to it. Those in favor of using surrogate keys as primary keys also focus on the relatively small key size, which is good for performance and reduces pages splits because the values are always inserted into the index sequentially.

The UNIQUE Constraint

The UNIQUE constraint is functionally similar to PRIMARY KEY. It also uses a unique index to enforce uniqueness, but unlike PRIMARY KEY, it allows nulls in the columns that participate in the UNIQUE constraint. The definition of a UNIQUE constraint with columns that are nulls is generally impractical. The value of NULL is considered a unique value, so you are limited to the number of rows that can be inserted with NULL values. For example, only one row with a NULL value in the constraint column can be inserted if the UNIQUE constraint is based on a single column. UNIQUE constraints with multiple nullable columns can have more than one row with null values in the constraint keys, but the number of rows is limited to the combination of unique values across all the columns.

An alternate unique key on the SalesTaxRate table is a good example of a unique constraint in the AdventureWorks2008 database. The AK_SalesTaxRate_StateProvinceID_TaxType index contain the StateProvinceId and TaxType columns. Each of these columns is defined as NOT NULL. In simple terms this means that TaxTypes must be unique within each state or province. If, however, the StateProvinceID was nullable then you could have one row for a given TaxType that is null then all other rows for that tax type must have the StateProvinceID to make the combination of StateProvinceId and Tax Type unique.

You generally use a UNIQUE constraint when a column other than the primary key must be guaranteed to be unique. For example, consider the Employee table example used in the previous section. The primary key on the identity column EmployeeID ensures that a unique value will be assigned to each employee row, but it does not prevent duplication in any of the other columns. For example, every row in the Employee table could have the same LoginID setting if no other UNIQUE constraints were found on this table. Generally, each employee should have his or her own unique LoginID. You can enforce this policy by adding a UNIQUE constraint on the LoginID column. The following example demonstrates the creation of a UNIQUE constraint on the EmployeeID column:

ALTER TABLE [HumanResources].[Employee]
 ADD CONSTRAINT AK_Employee_LoginID
  UNIQUE NONCLUSTERED (LoginID ASC)

As with PRIMARY KEY constraints, a unique index is created whenever a UNIQUE constraint is created. If you drop the UNIQUE constraint, you drop the unique index as well. Conversely, if you drop the unique index, you indirectly drop the UNIQUE constraint, too. You can implement a UNIQUE constraint as a constraint or an index. To illustrate this, the following example shows the creation of the same UNIQUE constraint on Employee_LoginID as before, this time using an index:

CREATE UNIQUE NONCLUSTERED INDEX [AK_Employee_LoginID]
 ON [HumanResources].[Employee]
(LoginID ASC)

Note

Although UNIQUE constraints and unique indexes achieve the same goal, they must be managed based on how they were created. In other words, if you create a UNIQUE constraint on a table, you cannot directly drop the associated unique index. If you try to drop the unique index directly, you get a message stating that an explicit DROP INDEX is not allowed and that it is being used for unique key constraint enforcement. To drop the UNIQUE constraint, you must use the DROP CONSTRAINT syntax associated with the ALTER TABLE statement. Similarly, if you create a unique index, you cannot drop that index by using a DROP CONSTRAINT statement; you must use DROP INDEX instead.

You can have more than one unique constraint per table. When creating unique constraints, you have all the standard index-creation options available. These options include how the underlying index is clustered, the fill factor, and a myriad of other index options.

The FOREIGN KEY Referential Integrity Constraint

The basic premise of a relational database is that tables are related. These relationships are maintained and enforced via referential integrity. FOREIGN KEY constraints are the declarative means for enforcing referential integrity in SQL Server. You implement FOREIGN KEY constraints by relating one or more columns in a table to the columns in a primary key or unique index. The columns in the referencing table can be referred to as foreign key columns. The table with the primary key or unique index can be referred to as the primary table. Figure 2 shows a relationship between the BusinessEntityAddress table and BusinessEntityAddress table. The foreign key in this example is AddressTypeID on the BusinessEntityAddress table. AddressTypeID on this table is related to the primary key on the AddressTypeID table. The foreign key relationship in this diagram is denoted by the line between these two tables.

Figure 2. A foreign key constraint on the BusinessEntityAddress table.

Once defined, a foreign key, by default, enforces the relationship between the tables in the following ways:

Values in the foreign key columns must have a corresponding value in the primary table. If the new values in the foreign key columns do not exist in the primary table, the insert or update operation fails.
Values in the primary key or unique index that are referenced by the foreign key table cannot be deleted. If an attempt is made to delete a referenced value in the primary table, the delete fails.
Values in the primary key or unique index that are referenced by the foreign key table cannot be modified. If an attempt is made to change a referenced value in the primary table, the update fails.

In the case of the AddressType/BusinessEntityAddress relationship shown in Figure 2, any AddressTypeID used in the BusinessEntityAddress table must have a corresponding value in the AddressType table. Listing 1 shows an INSERT statement in the BusinessEntityAddress table that does not have a valid AddressTypeID entry in the AddressType table. The statement fails, and the resulting message is shown after the INSERT statement. A similar error message is displayed if an attempt is made to delete or update values in the primary key or unique index that does not satisfy the foreign key contraint.

Listing 1. A Foreign Key Conflict with INSERT

INSERT Person.BusinessEntityAddress
 (BusinessEntityID,AddressID, AddressTypeID, rowguid, ModifiedDate)
 VALUES (1,249, 9, NEWID(), GETDATE())
/* RESULTS OF INSERT FOLLOW
Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY
constraint "FK_BusinessEntityAddress_AddressType_AddressTypeID".
The conflict occurred in database "AdventureWorks2008",
table "Person.AddressType", column 'AddressTypeID'.
The statement has been terminated.*/

The following example shows the T-SQL needed to create the foreign key relationship between the AddressType and BusinessEntityAddress tables:

ALTER TABLE [Person].[BusinessEntityAddress]
ADD  CONSTRAINT [FK_BusinessEntityAddress_AddressType_AddressTypeID]
  FOREIGN KEY([AddressTypeID])
REFERENCES [Person].[AddressType]
([AddressTypeID])

When you create a FOREIGN KEY constraint, the related primary key or unique index must exist first. In the case of the AddressType/BusinessEntityAddress relationship, the AddressType table and primary key on AddressTypeID must exist before you can create the FK_BusinessEntityAddress_AddressType_AddressTypeID foreign key. In addition, the data types of the related columns must be the same. The related columns in the two tables can actually have different names, but in practice the columns are usually named the same. Naming the columns the same makes your database much more intuitive.

Note

In addition to relating two different tables with a foreign key, you can also relate a table to itself. These self-referencing relationships are often found in organization tables or employee tables. For example, you could have an Employee table with a primary key of EmployeeID. This table could also have a ManagerID column. In this case, ManagerID on the Employee table has a relationship to the primary key index on EmployeeID. The manager is an employee, so it makes sense that they should have a valid EmployeeID. A foreign key on the Employee table will enforce this relationship and ensure that any ManagerID points to a different row in the table with a valid EmployeeID.

Cascading Referential Integrity

Cascading referential integrity has been around for some time and was introduced with SQL Server 2000. This type of integrity allows for updates and deletions on the primary table to be cascaded to the referencing foreign key tables. By default, a FOREIGN KEY constraint prevents updates and deletions to any primary key or unique index values referenced by a foreign key. With cascading referential integrity, you can bypass this restriction and are able to define the type of action you want to occur when the updates and deletions happen.

You define the cascading actions on the FOREIGN KEY constraint, using the ON DELETE and ON UPDATE clauses. The ON DELETE clause defines the cascading action for deletions to the primary table, and the ON UPDATE clause defines the actions for updates. These clauses are used with the CREATE TABLE or ALTER TABLE statements and are part of the REFERENCES clause of these statements.

You can specify the same cascading actions for updates and deletions:

NO ACTION— This action, the default, causes deletions and updates to the primary table to fail if the rows are referenced by a foreign key.
CASCADE— This option causes updates and deletions to cascade to any foreign key records that refer to the affected rows in the primary table. If the CASCADE option is used with the ON DELETE clause, any records in the foreign key table that refer to the deleted rows in the primary table are also deleted. When CASCADE is used with the ON UPDATE clause, any updates to the primary table records are also made in the related rows of the foreign key table.
SET NULL— This option was new in SQL Server 2005. It is similar to the CASCADE option except that the affected rows in the foreign key table are set to NULL when deletions or updates are performed on the related primary table. The value of NULL is assigned to every column that is defined as part of the foreign key and requires that each column in the foreign key allow null values.
SET DEFAULT— This option also was new in SQL Server 2005. It is similar to the CASCADE option except that the affected rows in the foreign key table are set to the default values defined on the columns when deletions or updates are performed on the related primary table. If you want to set this option, each column in the foreign key must have a default definition assigned to it, or it must be defined as nullable. If no default definition is assigned to the column, NULL is used as the default value. It is imperative that the primary table have related records for the default or null entries that can result from the cascading action. For example, if you have a two-column foreign key, and each column has a default of 1, a corresponding record with the key values of 1 and 1 needs to exist in the primary table, or the cascade action fails. The integrity of the relationship must be maintained.

To illustrate the power of cascading actions, consider the AddressType/BusinessEntity Address relationship used in previous examples. Let’s say you want to remove the associated BusinessEntityAddress records when an AddressType record is deleted. The addition of the ON DELETE CASCADE clause at the bottom of the following foreign key definition achieves this result:

ALTER TABLE [Person].[BusinessEntityAddress]
ADD  CONSTRAINT [FK_BusinessEntityAddress_AddressType_AddressTypeID]
  FOREIGN KEY([AddressTypeID])
REFERENCES [Person].[AddressType]
([AddressTypeID])
ON DELETE CASCADE

Keep in mind that other factors affect the successful execution of a cascading deletion. If other foreign keys exist on the table, and they do not have ON DELETE CASCADE specified, the cascading actions do not succeed if a foreign key violation occurs on these tables. In addition, you need to consider the existence of triggers that may prevent deletions from occurring. Also, you need to consider that a series of cascading actions can be initiated by a single DELETE statement. This happens when you have many related tables, each of which has cascading actions defined. This approach works fine as long as there are no circular references that cause one of the tables in the cascading tree to be affected by a table lower in the tree.

If you want to specify the cascading action for updates, you can add an additional ON UPDATE clause, along with the ON DELETE clause. For example, you can change the foreign key in the previous example so that BusinessEntityAddress records are set to NULL when an update is made to the related key on the primary table. This can be accomplished with the following foreign key definition:

ALTER TABLE [Person].[BusinessEntityAddress]
ADD  CONSTRAINT [FK_BusinessEntityAddress_AddressType_AddressTypeID]
  FOREIGN KEY([AddressTypeID])
REFERENCES [Person].[AddressType]
([AddressTypeID])
ON DELETE CASCADE
 ON UPDATE SET NULL

You can see that cascading referential integrity is a powerful tool. However, it must be used with caution. Consider the fact that foreign keys without cascading actions may prevent erroneous actions. For example, if a DELETE statement is mistakenly executed against the entire AddressType table, the deletion would fail before the records could be deleted because foreign key tables are referencing the AddressType table. This failure would be a good thing. If, however, the ON DELETE CASCADE clause were used in the foreign key definitions, the erroneous deletion would succeed, and all the foreign key records would be deleted as well.

Others