Constraints—including PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and DEFAULT—are
the primary method used to enforce data integrity.
The PRIMARY KEY Constraint
The PRIMARY KEY
constraint is one of the key methods for ensuring entity integrity. When
this constraint is defined on a table, it ensures that every row can be
uniquely identified with the primary key value(s). The primary key can
have one or more columns as part of its definition. None of the columns
in the primary key definition can allow nulls. When multiple columns are
used in the definition of the primary key, the combination of the
values in all the primary key columns must be unique. Duplication can
exist in a single column that is part of a multicolumn primary key.
There can be only one
primary key defined for each table. When a primary key is defined on a
table, a unique index is automatically created as well. This index
contains all the columns in the primary key and ensures that the rows in
this index are unique. Generally, every table in a database should have
a primary key. The primary key and its associated unique index provide
fast access to a database table.
Figure 26.1 shows the Adventureworks2008 database Employee table, which is an example of a table that has a primary key defined. The primary key in this table is EmployeeID, and it is denoted in the dialog shown in Figure 1 with a key symbol in the leftmost column.
The existing primary key on the Employee table in the Adventureworks2008 database was generated as a T-SQL script, as shown in the following example:
ALTER TABLE [HumanResources].[Employee]
ADD CONSTRAINT [PK_Employee_BusinessEntityID] PRIMARY KEY CLUSTERED
(BusinessEntityID ASC)
In general, you try to choose a primary key that is relatively short. BusinessEntityID,
for example, is a good choice because it is an integer column and takes
only 4 bytes of storage. This is particularly important when the
primary key is CLUSTERED, as in the case of PK_Employee_BusinessEntityID.
The key values from the clustered index are used by all nonclustered
indexes as lookup keys. If the clustered key is large, this consumes
more space and affects performance.
Surrogate keys are often good choices for primary keys. The BusinessEnityID column in the Person.BusinessEntity
table is an example of a surrogate key. Surrogate keys consist of a
single column that automatically increments and is inherently unique, as
in the case of an identity column. Surrogate keys are good candidates
for primary keys because they are implicitly unique and relatively short
in length. You should avoid using large, multicolumn indexes as primary
keys. They can impede performance because fewer index rows can be
stored on each index page.
Note
Over the years, there has
been much debate over the use of surrogate keys for primary keys. One
school of thought is to avoid surrogate keys because insertions always
occur at the end of the primary key index and are not distributed. This
can lead to “hot spots” in the index because the insert activity is
always on the last page of the index. In addition, surrogate keys have
no real meaning and are less intuitive than primary keys that have
meaning, such as lastname and firstname.
The
other school of thought, in favor of using surrogate keys for primary
keys, emphasizes the importance of defining primary keys that are not
based on meaningful columns. If meaningful columns are used and the
definitions of those columns change, this can have a significant impact
on the table that contains the primary key and any tables related to it.
Those in favor of using surrogate keys as primary keys also focus on
the relatively small key size, which is good for performance and reduces
pages splits because the values are always inserted into the index
sequentially.
The UNIQUE Constraint
The UNIQUE constraint is functionally similar to PRIMARY KEY. It also uses a unique index to enforce uniqueness, but unlike PRIMARY KEY, it allows nulls in the columns that participate in the UNIQUE constraint. The definition of a UNIQUE constraint with columns that are nulls is generally impractical. The value of NULL is considered a unique value, so you are limited to the number of rows that can be inserted with NULL values. For example, only one row with a NULL value in the constraint column can be inserted if the UNIQUE constraint is based on a single column. UNIQUE
constraints with multiple nullable columns can have more than one row
with null values in the constraint keys, but the number of rows is
limited to the combination of unique values across all the columns.
An alternate unique key on the
SalesTaxRate table is a good example of a unique constraint in the
AdventureWorks2008 database. The AK_SalesTaxRate_StateProvinceID_TaxType
index contain the StateProvinceId and TaxType columns. Each of these
columns is defined as NOT NULL. In simple terms this means that TaxTypes
must be unique within each state or province. If, however, the
StateProvinceID was nullable then you could have one row for a given
TaxType that is null then all other rows for that tax type must have the
StateProvinceID to make the combination of StateProvinceId and Tax Type
unique.
You generally use a UNIQUE constraint when a column other than the primary key must be guaranteed to be unique. For example, consider the Employee table example used in the previous section. The primary key on the identity column EmployeeID
ensures that a unique value will be assigned to each employee row, but
it does not prevent duplication in any of the other columns. For
example, every row in the Employee table could have the same LoginID setting if no other UNIQUE constraints were found on this table. Generally, each employee should have his or her own unique LoginID. You can enforce this policy by adding a UNIQUE constraint on the LoginID column. The following example demonstrates the creation of a UNIQUE constraint on the EmployeeID column:
ALTER TABLE [HumanResources].[Employee]
ADD CONSTRAINT AK_Employee_LoginID
UNIQUE NONCLUSTERED (LoginID ASC)
As with PRIMARY KEY constraints, a unique index is created whenever a UNIQUE constraint is created. If you drop the UNIQUE constraint, you drop the unique index as well. Conversely, if you drop the unique index, you indirectly drop the UNIQUE constraint, too. You can implement a UNIQUE constraint as a constraint or an index. To illustrate this, the following example shows the creation of the same UNIQUE constraint on Employee_LoginID as before, this time using an index:
CREATE UNIQUE NONCLUSTERED INDEX [AK_Employee_LoginID]
ON [HumanResources].[Employee]
(LoginID ASC)
Note
Although UNIQUE
constraints and unique indexes achieve the same goal, they must be
managed based on how they were created. In other words, if you create a UNIQUE
constraint on a table, you cannot directly drop the associated unique
index. If you try to drop the unique index directly, you get a message
stating that an explicit DROP INDEX is not allowed and that it is being used for unique key constraint enforcement. To drop the UNIQUE constraint, you must use the DROP CONSTRAINT syntax associated with the ALTER TABLE statement. Similarly, if you create a unique index, you cannot drop that index by using a DROP CONSTRAINT statement; you must use DROP INDEX instead.
You can have more than one
unique constraint per table. When creating unique constraints, you have
all the standard index-creation options available. These options include
how the underlying index is clustered, the fill factor, and a myriad of
other index options.
The FOREIGN KEY Referential Integrity Constraint
The basic premise of a
relational database is that tables are related. These relationships are
maintained and enforced via referential integrity. FOREIGN KEY constraints are the declarative means for enforcing referential integrity in SQL Server. You implement FOREIGN KEY
constraints by relating one or more columns in a table to the columns
in a primary key or unique index. The columns in the referencing table
can be referred to as foreign key columns. The table with the primary key or unique index can be referred to as the primary table. Figure 2 shows a relationship between the BusinessEntityAddress table and BusinessEntityAddress table. The foreign key in this example is AddressTypeID on the BusinessEntityAddress table. AddressTypeID on this table is related to the primary key on the AddressTypeID table. The foreign key relationship in this diagram is denoted by the line between these two tables.
Once defined, a foreign key, by default, enforces the relationship between the tables in the following ways:
Values in the
foreign key columns must have a corresponding value in the primary
table. If the new values in the foreign key columns do not exist in the
primary table, the insert or update operation fails.
Values
in the primary key or unique index that are referenced by the foreign
key table cannot be deleted. If an attempt is made to delete a
referenced value in the primary table, the delete fails.
Values
in the primary key or unique index that are referenced by the foreign
key table cannot be modified. If an attempt is made to change a
referenced value in the primary table, the update fails.
In the case of the AddressType/BusinessEntityAddress relationship shown in Figure 2, any AddressTypeID used in the BusinessEntityAddress table must have a corresponding value in the AddressType table. Listing 1 shows an INSERT statement in the BusinessEntityAddress table that does not have a valid AddressTypeID entry in the AddressType table. The statement fails, and the resulting message is shown after the INSERT
statement. A similar error message is displayed if an attempt is made
to delete or update values in the primary key or unique index that does
not satisfy the foreign key contraint.
Listing 1. A Foreign Key Conflict with INSERT
INSERT Person.BusinessEntityAddress
(BusinessEntityID,AddressID, AddressTypeID, rowguid, ModifiedDate)
VALUES (1,249, 9, NEWID(), GETDATE())
/* RESULTS OF INSERT FOLLOW
Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY
constraint "FK_BusinessEntityAddress_AddressType_AddressTypeID".
The conflict occurred in database "AdventureWorks2008",
table "Person.AddressType", column 'AddressTypeID'.
The statement has been terminated.*/
|
The following example shows the T-SQL needed to create the foreign key relationship between the AddressType and BusinessEntityAddress tables:
ALTER TABLE [Person].[BusinessEntityAddress]
ADD CONSTRAINT [FK_BusinessEntityAddress_AddressType_AddressTypeID]
FOREIGN KEY([AddressTypeID])
REFERENCES [Person].[AddressType]
([AddressTypeID])
When you create a FOREIGN KEY constraint, the related primary key or unique index must exist first. In the case of the AddressType/BusinessEntityAddress relationship, the AddressType table and primary key on AddressTypeID must exist before you can create the FK_BusinessEntityAddress_AddressType_AddressTypeID
foreign key. In addition, the data types of the related columns must be
the same. The related columns in the two tables can actually have
different names, but in practice the columns are usually named the same.
Naming the columns the same makes your database much more intuitive.
Note
In addition to relating two
different tables with a foreign key, you can also relate a table to
itself. These self-referencing relationships are often found in
organization tables or employee tables. For example, you could have an
Employee table with a primary key of EmployeeID. This table could also have a ManagerID column. In this case, ManagerID on the Employee table has a relationship to the primary key index on EmployeeID. The manager is an employee, so it makes sense that they should have a valid EmployeeID. A foreign key on the Employee table will enforce this relationship and ensure that any ManagerID points to a different row in the table with a valid EmployeeID.
Cascading Referential Integrity
Cascading referential
integrity has been around for some time and was introduced with SQL
Server 2000. This type of integrity allows for updates and deletions on
the primary table to be cascaded to the referencing foreign key tables.
By default, a FOREIGN KEY constraint
prevents updates and deletions to any primary key or unique index
values referenced by a foreign key. With cascading referential
integrity, you can bypass this restriction and are able to define the
type of action you want to occur when the updates and deletions happen.
You define the cascading actions on the FOREIGN KEY constraint, using the ON DELETE and ON UPDATE clauses. The ON DELETE clause defines the cascading action for deletions to the primary table, and the ON UPDATE clause defines the actions for updates. These clauses are used with the CREATE TABLE or ALTER TABLE statements and are part of the REFERENCES clause of these statements.
You can specify the same cascading actions for updates and deletions:
NO ACTION—
This action, the default, causes deletions and updates to the primary
table to fail if the rows are referenced by a foreign key.
CASCADE— This
option causes updates and deletions to cascade to any foreign key
records that refer to the affected rows in the primary table. If the CASCADE option is used with the ON DELETE clause, any records in the foreign key table that refer to the deleted rows in the primary table are also deleted. When CASCADE is used with the ON UPDATE clause, any updates to the primary table records are also made in the related rows of the foreign key table.
SET NULL— This option was new in SQL Server 2005. It is similar to the CASCADE option except that the affected rows in the foreign key table are set to NULL when deletions or updates are performed on the related primary table. The value of NULL
is assigned to every column that is defined as part of the foreign key
and requires that each column in the foreign key allow null values.
SET DEFAULT— This option also was new in SQL Server 2005. It is similar to the CASCADE
option except that the affected rows in the foreign key table are set
to the default values defined on the columns when deletions or updates
are performed on the related primary table. If you want to set this
option, each column in the foreign key must have a default definition
assigned to it, or it must be defined as nullable. If no default
definition is assigned to the column, NULL
is used as the default value. It is imperative that the primary table
have related records for the default or null entries that can result
from the cascading action. For example, if you have a two-column foreign
key, and each column has a default of 1, a corresponding record with the key values of 1 and 1 needs to exist in the primary table, or the cascade action fails. The integrity of the relationship must be maintained.
To illustrate the power of cascading actions, consider the AddressType/BusinessEntity Address relationship used in previous examples. Let’s say you want to remove the associated BusinessEntityAddress records when an AddressType record is deleted. The addition of the ON DELETE CASCADE clause at the bottom of the following foreign key definition achieves this result:
ALTER TABLE [Person].[BusinessEntityAddress]
ADD CONSTRAINT [FK_BusinessEntityAddress_AddressType_AddressTypeID]
FOREIGN KEY([AddressTypeID])
REFERENCES [Person].[AddressType]
([AddressTypeID])
ON DELETE CASCADE
Keep in mind that other
factors affect the successful execution of a cascading deletion. If
other foreign keys exist on the table, and they do not have ON DELETE CASCADE
specified, the cascading actions do not succeed if a foreign key
violation occurs on these tables. In addition, you need to consider the
existence of triggers that may prevent deletions from occurring. Also,
you need to consider that a series of cascading actions can be initiated
by a single DELETE statement. This
happens when you have many related tables, each of which has cascading
actions defined. This approach works fine as long as there are no
circular references that cause one of the tables in the cascading tree
to be affected by a table lower in the tree.
If you want to specify the cascading action for updates, you can add an additional ON UPDATE clause, along with the ON DELETE clause. For example, you can change the foreign key in the previous example so that BusinessEntityAddress records are set to NULL
when an update is made to the related key on the primary table. This
can be accomplished with the following foreign key definition:
ALTER TABLE [Person].[BusinessEntityAddress]
ADD CONSTRAINT [FK_BusinessEntityAddress_AddressType_AddressTypeID]
FOREIGN KEY([AddressTypeID])
REFERENCES [Person].[AddressType]
([AddressTypeID])
ON DELETE CASCADE
ON UPDATE SET NULL
You
can see that cascading referential integrity is a powerful tool.
However, it must be used with caution. Consider the fact that foreign
keys without cascading actions may prevent erroneous actions. For
example, if a DELETE statement is mistakenly executed against the entire AddressType table, the deletion would fail before the records could be deleted because foreign key tables are referencing the AddressType table. This failure would be a good thing. If, however, the ON DELETE CASCADE
clause were used in the foreign key definitions, the erroneous deletion
would succeed, and all the foreign key records would be deleted as
well.