Microsoft Access 2010 : Introduction to Relational Database Design (part 2) - Normalization and Normal Forms

4/18/2014 3:10:24 AM

Normalization and Normal Forms

Some of the most difficult decisions that you face as a developer are what tables to create and what fields to place in each table, and how to relate the tables that you create. Normalization is the process of applying a series of rules to ensure that a database achieves optimal structure. Normal forms are a progression of these rules. Each successive normal form achieves a better database design than the previous form. Although there are several levels of normal forms, it is generally sufficient to apply only the first three levels of normal forms. The following sections describe the first three levels of normal forms.

First Normal Form

To achieve first normal form, all columns in a table must be atomic. This means, for example, that you cannot store first name and last name in the same field. The reason for this rule is that data becomes very difficult to manipulate and retrieve if you store multiple values in a single field. Let’s use the full name as an example. It would be impossible to sort by first name or last name independently if you stored both values in the same field. Furthermore, you or the user would have to perform extra work to extract just the first name or just the last name from the field.

Another requirement for first normal form is that the table must not contain repeating values. An example of repeating values is a scenario in which Item1, Quantity1, Item2, Quantity2, Item3, and Quantity3 fields are all found within the Orders table (see Figure 1). This design introduces several problems. What if the user wants to add a fourth item to the order? Furthermore, finding the total ordered for a product requires searching several columns. In fact, all numeric and statistical calculations on the table are extremely cumbersome. Repeating groups make it difficult to summarize and manipulate table data. The alternative, shown in Figure 2, achieves first normal form. Notice that each item ordered is located in a separate row. All fields are atomic, and the table contains no repeating groups.

Figure 1. A table that contains repeating groups.

Figure 2. A table achieves first normal form.

Second Normal Form

For a table to achieve second normal form, all nonkey columns must be fully dependent on all the fields that make up the primary key. This rule only applies to tables that have a composite key: a key made up of two or more fields. For example, the table shown in Figure 2 has a primary key made up of the OrderID and CustomerID. The combination of those two keys ensures that the primary key is unique for every row in the table, which is the purpose of the primary key.

However, this table is not in second normal form because some of the information in the table depends only on part of the primary key. For instance, the CustInfo field depends only on the CustomerId field: If you were to change the CustomerId field you’d also have to change the CustInfo field. In real life, of course, there is every possibility that the CustomerId field would be changed and the CustInfo would not, leading to problems in the application.

To achieve second normal form, you must break this data into two tables—an order table and a customer table. The customer table would contain the CustomerId field from the primary key and the fields that depend on it (the CustInfo field, in this case); the order table would consist of the OrderId field and the fields that depend upon it. The order table would still have the CustomerId field (so you’d know which customer to ship the order to) but wouldn’t have any other customer information (all other customer information would be in the customer table). If you assign the order to another customer, you’d only have to change the CustomerId field.

The process of breaking the data into two tables is called decomposition. Decomposition is considered to be nonloss decomposition because no data is lost during the decomposition process. After you separate the data into two tables, you can easily bring the data back together by joining the two tables via a query. Figure 3 shows the data separated into two tables. These two tables achieve second normal form for two reasons. First, neither table has a composite key—their primary keys now have only a single field (second normal form only applies to tables with a composite primary key). Second, the fields in each table depend on the whole of the primary key rather than on just part of the key.

Figure 3. Tables that achieve second normal form.

Third Normal Form

To attain third normal form, a table must meet all the requirements for first and second normal forms, and all nonkey columns dependent only on the primary key field and not dependent on each other—all the fields are independent of each other. This means that you must eliminate any calculations, and you must break out the data into lookup tables. Lookup tables include tables such as Inventory tables, Course tables, State tables, and any other table where we look up a set of values from which we select the entry that we store in the foreign key field. For example, from our Customer table, we look up within the set of states in the state table to select the state associated with the customer.

An example of a calculation stored in a table is the product of price multiplied by quantity; the extended price is dependent on two other fields in the table—the price and the quantity. If either of those fields is changed, then the extended price would also have to be changed. As with the example in second normal form, there is every possibility of changing one or two of these fields without changing the third one, leading to problems in the application. Instead of storing the result of this calculation in the table, you would generate the calculation in a query or in the control source of a control on a form or a report.

The example in Figure 3 does not achieve third normal form because the description of the inventory items is stored in the Order Details table. If the description changes, all rows with that inventory item need to be modified. The Order Details table, shown in Figure 4, shows the item descriptions broken into an Inventory table. This design achieves third normal form. We have moved the description of the inventory items to an Inventory table, and ItemID is stored in the Order Details table. All fields are mutually independent. You can modify the description of an inventory item in one place.

Figure 4. A table (on the right) that achieves third normal form.

Denormalization: Purposely Violating the Rules

Although a developer’s goal is normalization, sometimes it makes sense to deviate from normal forms. This process is called denormalization. The primary reason for applying denormalization is to enhance performance.

An example of when denormalization might be the preferred tact could involve an open invoices table and a summarized accounting table. It might be impractical to calculate summarized accounting information for a customer when you need it. Instead, you can maintain the summary calculations in a summarized accounting table so that you can easily retrieve them as needed. Although the upside of this scenario is improved performance, the downside is that you must update the summary table whenever you make changes to the open invoices. This imposes a definite trade-off between performance and maintainability. You must decide whether the trade-off is worth it.

If you decide to denormalize, you should document your decision. You should make sure that you make the necessary application adjustments to ensure that you properly maintain the denormalized fields. Finally, you need to test to ensure that the denormalization process actually improves performance.

Integrity Rules

Although integrity rules are not part of normal forms, they are definitely part of the database design process. Integrity rules are broken into two categories: overall integrity rules and database-specific integrity rules.

Overall Integrity Rules

The two types of overall integrity rules are referential integrity rules and entity integrity rules. Referential integrity rules dictate that a database does not contain any orphan foreign key values. This means that

Child rows cannot be added for parent rows that do not exist. In other words, an order cannot be added for a nonexistent customer.
A primary key value cannot be modified if the value is used as a foreign key in a child table. This means that a CustomerID in the customers table cannot be changed if the Orders table contains rows with that CustomerID.
A parent row cannot be deleted if child rows have that foreign key value. For example, a customer cannot be deleted if the customer has orders in the Orders table.

Entity integrity dictates that the primary key value cannot be Null. This rule applies not only to single-column primary keys, but also to multicolumn primary keys. In fact, in a multicolumn primary key, no field in the primary key can be Null. This makes sense because if any part of the primary key can be Null, the primary key can no longer act as a unique identifier for the row. Fortunately, the Jet Engine does not allow a field in a primary key to be Null.

Database-Specific Integrity Rules

Database-specific integrity rules are not applicable to all databases, but are, instead, dictated by business rules that apply to a specific application. Database-specific rules are as important as overall integrity rules. They ensure that the user enters only valid data into a database. An example of a database-specific integrity rule is requiring the delivery date for an order to fall after the order date.

Others

- Microsoft Access 2010 : Introduction to Relational Database Design (part 1) - Rules of Relational Database Design

- Developing Custom Microsoft Visio 2010 Solutions : Creating SmartShapes with the ShapeSheet (part 6) - Adding Right-Click Actions to the SmartShape

- Developing Custom Microsoft Visio 2010 Solutions : Creating SmartShapes with the ShapeSheet (part 5) - Modifying the Text Block Using the ShapeSheet

- Developing Custom Microsoft Visio 2010 Solutions : Creating SmartShapes with the ShapeSheet (part 4) - Linking Subshape Text to Shape Data Fields

- Developing Custom Microsoft Visio 2010 Solutions : Creating SmartShapes with the ShapeSheet (part 3) - Controlling Grouped Shapes with the ShapeSheet

- Developing Custom Microsoft Visio 2010 Solutions : Creating SmartShapes with the ShapeSheet (part 2) - Creating Smart Geometry in the ShapeSheet

- Developing Custom Microsoft Visio 2010 Solutions : Creating SmartShapes with the ShapeSheet (part 1) - Introducing the ShapeSheet

- Developing Custom Microsoft Visio 2010 Solutions : Introducing the Notes Shape, Using the Developer Ribbon Tab

- Microsoft Excel 2010 : Working with Graphics - Inserting Clip Art, Inserting a Picture from File

- Microsoft Excel 2010 : Working with Graphics - Using Drawing Tools