SQL Server 2012 : Database Basics (part 3) - Foreign Keys, Optionality

11/8/2013 2:07:19 AM

7. Primary Keys

Perhaps the most important concept of an entity (table) is that it has a primary key — an attribute or set of attributes that can be used to uniquely identify the tuple (row). Every entity must have a primary key; without a primary key, it's not a valid entity.

By definition, a primary key must be unique and must have a value (not null). The simplest primary key is identified by a single column. For example, a database may contain an employee table (entity) whose primary key could be the employees' Social Security number or a system-generated employee identifier.

For some entities, there might be multiple possible primary keys to choose from: employee number, driver's license number, national ID number (ssn). In this case, all the potential primary keys are known as candidate keys. Candidate keys that are not selected as the primary key are then known as alternate keys. It's important to document all the candidate keys because later, at the SQL DLL layer, they need unique constraints.

At the conceptual diagramming phase, a primary key might be obvious — an employee number, an automobile VIN number, a state or region name — but often there is no clearly recognizable uniquely identifying value for each item in reality. That's OK because that problem can be solved later during the SQL DLL layer.

8. Foreign Keys

When two entities (tables) relate to one another, one entity is typically the primary entity, and the other entity is the secondary entity.

The connection between the two entities is made by replicating the primary key from the primary entity in the secondary entity. The duplicated attributes in the secondary entity are known as a foreign key. Informally this type of relationship is sometimes called a parent-child relationship.

Enforcing the foreign key is referred to as referential integrity. This type of integrity ensures that values in the secondary table are contained within the primary table. By applying referential integrity to your database, you assist in yielding accurate and valid result sets.

The classic example of a primary key and foreign key relationship is the order and order details relationship. Each order item (primary entity) can have multiple order detail rows (secondary entity). The order's primary key is duplicated in the order detail entity, providing the link between the two entities, as shown in Figure 2.

Figure 2 A one-to-many relationship consists of a primary entity and a secondary entity. The secondary entity's foreign key points to the primary entity's primary key. In this case, the Sales.SalesOrderDetail's SalesOrderID is the foreign key that relates to Sales.SalesOrderHeader's primary key.

If the database was not properly normalized, you would see the order information for a specific order repeated for each order detail associated with that order.

9. Cardinality

The cardinality of the relationship describes the number of tuples (rows) on each side of the relationship. Either side of the relationship may be restricted to allow zero, one, or multiple tuples.

The type of key enforces the restriction of multiple tuples. Primary keys are by definition unique and enforce the single-tuple restriction, whereas foreign keys permit multiple tuples.

There are several possible cardinality combinations, as shown in Table 2. Within this section, each of the cardinality possibilities is examined in detail.

Table 2 Common Relationship Cardinalities

Relationship Type	First Entity's Key	Second Entity's Key
One-to-one	Primary entity–primary key–single tuple	Primary entity–primary key–single tuple
One-to-many	Primary entity–primary key–single tuple	Secondary entity–foreign key–multiple tuples
Many-to-many	Multiple tuples	Multiple tuples

10. Optionality

The second property of the relationship is its optionality. The difference between an optional relationship and a mandatory relationship is critical to the data integrity of the database.

Some relationships are mandatory, or strong. These secondary tuples (rows) require that the foreign key point to a primary key. The secondary tuple would be incomplete or meaningless without the primary entity. For the following examples, it's critical that the relationship be enforced:

An order-line item without an order is meaningless.
An order without a customer is invalid.

In the AdventureWorks2012 sample database, a salesorderdetail without an associated product is a useless detail. Conversely, some relationships are optional, or weak. The secondary tuple can stand alone without the primary tuple. The object in reality that is represented by the secondary tuple would exist with or without the primary tuple. For example:

A customer is valid with or without a discount code.
In the AdventureWorks2012 sample database, an order may or may not have a sales person. Whether or not the order points to a valid tuple in the sales person entity, it's still a valid order.

Some database developers prefer to avoid optional relationships, so they design all relationships as mandatory, and point tuples that wouldn't need a foreign key value to a surrogate tuple in the primary table. For example, rather than allow nulls in the discount attribute for customers without discounts, a “no discount” tuple is inserted into the discount entity, and every customer without a discount points to that tuple.

There are two reasons to avoid surrogate null tuples (pointing to a “no discount” tuple): The design adds work when work isn't required (additional inserts and foreign key checks), and it's easier to locate a tuple without the relationship by selecting where column is not null. The null value is a standard and useful design element. Ignoring the benefits of nullability creates additional work for both the developer and the database.

From a purist's point of view, a benefit of using the surrogate null tuple is that the “no discount” is explicit and a null value can then actually mean unknown or missing, rather than “no discount.”

Some rare situations call for a complex optionality based on a condition. Depending on a rule, the relationship must be enforced, for example:

If an organization sometimes sells ad hoc items that are not in the item entity, the relationship may, depending on the item, be considered optional. The orderdetail entity can use two attributes for the item. If the ItemID attribute is used, it must point to a valid item entity primary key.
However, if the NonStandardItemDescription attribute is used instead, the ItemID attribute is left null.
A check constraint ensures that for each row, either the ItemID or NonStandardItemDescription is null.

How the optionality is implemented is up to the SQL DDL Layer. The only purpose of the conceptual design layer is to model the organization's objects, their relationships, and their business rules.

Data-Model Diagramming

Data modelers use several methods to graphically work out their data models. The Chen ER diagramming method is popular, and Visio Professional includes it and five others. Information Engineering — E/R Diagramming, is rather simple, easy to understand and explain, and works well on a whiteboard, as shown in Figure 3. The cardinality of the relationship is indicated by a single line or by three lines (crow's feet). If the relationship is optional, a circle is placed near the foreign key.

Figure 3 A simple method for diagramming logical schemas.

Another benefit of this simple diagramming method is that it doesn't require an advanced version of Visio. Visio is OK as a starting point, but it doesn't give you a nice life cycle like a dedicated modeling tool. There are several more powerful tools, but it's actually a personal preference.

Others