7. Primary Keys
Perhaps the most important concept of
an entity (table) is that it has a primary key — an attribute or set of
attributes that can be used to uniquely identify the tuple (row). Every
entity must have a primary key; without a primary key, it's not a valid
entity.
By definition, a primary key must be unique and
must have a value (not null). The simplest primary key is identified by
a single column. For example, a database may contain an employee table
(entity) whose primary key could be the employees' Social Security
number or a system-generated employee identifier.
For some entities, there might be multiple
possible primary keys to choose from: employee number, driver's license
number, national ID number (ssn). In this case, all the potential
primary keys are known as candidate keys. Candidate keys that are not selected as the primary key are then known as alternate keys. It's important to document all the candidate keys because later, at the SQL DLL layer, they need unique constraints.
At the conceptual diagramming phase, a
primary key might be obvious — an employee number, an automobile VIN
number, a state or region name — but often there is no clearly
recognizable uniquely identifying value for each item in reality.
That's OK because that problem can be solved later during the SQL DLL
layer.
8. Foreign Keys
When two entities (tables) relate to
one another, one entity is typically the primary entity, and the other
entity is the secondary entity.
The connection between the two entities is made
by replicating the primary key from the primary entity in the secondary
entity. The duplicated attributes in the secondary entity are known as
a foreign key. Informally this type of relationship is sometimes called a parent-child relationship.
Enforcing the foreign key is referred to as referential integrity.
This type of integrity ensures that values in the secondary table are
contained within the primary table. By applying referential integrity
to your database, you assist in yielding accurate and valid result sets.
The classic example of a primary key and foreign key relationship is the order and order details
relationship. Each order item (primary entity) can have multiple order
detail rows (secondary entity). The order's primary key is duplicated
in the order detail entity, providing the link between the two
entities, as shown in Figure 2.
If the database was not properly normalized, you
would see the order information for a specific order repeated for each
order detail associated with that order.
9. Cardinality
The cardinality of the relationship
describes the number of tuples (rows) on each side of the relationship.
Either side of the relationship may be restricted to allow zero, one,
or multiple tuples.
The type of key enforces the restriction of
multiple tuples. Primary keys are by definition unique and enforce the
single-tuple restriction, whereas foreign keys permit multiple tuples.
There are several possible cardinality combinations, as shown in Table 2. Within this section, each of the cardinality possibilities is examined in detail.
Table 2 Common Relationship Cardinalities
One-to-one |
Primary entity–primary key–single tuple |
Primary entity–primary key–single tuple |
One-to-many |
Primary entity–primary key–single tuple |
Secondary entity–foreign key–multiple tuples |
Many-to-many |
Multiple tuples |
Multiple tuples |
10. Optionality
The second property of the relationship is its optionality.
The difference between an optional relationship and a mandatory
relationship is critical to the data integrity of the database.
Some relationships are mandatory, or strong.
These secondary tuples (rows) require that the foreign key point to a
primary key. The secondary tuple would be incomplete or meaningless
without the primary entity. For the following examples, it's critical
that the relationship be enforced:
- An order-line item without an order is meaningless.
- An order without a customer is invalid.
In the AdventureWorks2012 sample database, a salesorderdetail
without an associated product is a useless detail. Conversely, some
relationships are optional, or weak. The secondary tuple can stand
alone without the primary tuple. The object in reality that is
represented by the secondary tuple would exist with or without the
primary tuple. For example:
- A customer is valid with or without a discount code.
- In the AdventureWorks2012
sample database, an order may or may not have a sales person. Whether
or not the order points to a valid tuple in the sales person entity,
it's still a valid order.
Some database developers prefer to avoid optional
relationships, so they design all relationships as mandatory, and point
tuples that wouldn't need a foreign key value to a surrogate tuple in
the primary table. For example, rather than allow nulls in the discount
attribute for customers without discounts, a “no discount” tuple is
inserted into the discount entity, and every customer without a discount points to that tuple.
There are two reasons to avoid surrogate null
tuples (pointing to a “no discount” tuple): The design adds work when
work isn't required (additional inserts and foreign key checks), and
it's easier to locate a tuple without the relationship by selecting where column is not null.
The null value is a standard and useful design element. Ignoring the
benefits of nullability creates additional work for both the developer
and the database.
From a purist's point of view, a benefit of using
the surrogate null tuple is that the “no discount” is explicit and a
null value can then actually mean unknown or missing, rather than “no
discount.”
Some rare situations call for a complex
optionality based on a condition. Depending on a rule, the relationship
must be enforced, for example:
- If an organization sometimes sells ad hoc items that are not in the
item entity, the relationship may, depending on the item, be considered
optional. The orderdetail entity can use two attributes for the item. If the ItemID attribute is used, it must point to a valid item entity primary key.
- However, if the NonStandardItemDescription attribute is used instead, the ItemID attribute is left null.
- A check constraint ensures that for each row, either the ItemID or NonStandardItemDescription is null.
How the optionality is implemented is up to the
SQL DDL Layer. The only purpose of the conceptual design layer is to
model the organization's objects, their relationships, and their
business rules.
Data-Model Diagramming
Data modelers use several methods to
graphically work out their data models. The Chen ER diagramming method
is popular, and Visio Professional includes it and five others.
Information Engineering — E/R Diagramming, is rather simple, easy to
understand and explain, and works well on a whiteboard, as shown in Figure 3.
The cardinality of the relationship is indicated by a single line or by
three lines (crow's feet). If the relationship is optional, a circle is
placed near the foreign key.
Another benefit of this
simple diagramming method is that it doesn't require an advanced
version of Visio. Visio is OK as a starting point, but it doesn't give
you a nice life cycle like a dedicated modeling tool. There are several
more powerful tools, but it's actually a personal preference.