SQL Server : Common Problems with Data Integrity - Enforcing Data Integrity in the Application Layer

5/25/2013 9:40:01 PM

In this example, consider the Boxes table, shown in Listing 1, which our application needs to populate.

Listing 1. Creating the Boxes table, which is populated by our application.

Our application has already loaded some data into our table, as represented by the script shown in Listing 2.

Listing 2. Loading some existing data into the Boxes table.

However, suppose that we then develop a new version of our application, in which we have started to enforce the following rule when inserting rows into the Boxes table:

The height of a box must be less than, or equal to, the width; and the width must be less than, or equal to, the length.

At some later point, we are asked to develop a query that returns all the boxes with at least one dimension that is greater than 4 inches. With our new business rule in place we know (or at least we think we know) that the longest dimension of any box is the length, so all we have to do in our query is check for boxes with a length of more than 4 inches. Listing 3 meets these requirements.

Listing 3. A query to retrieve all boxes with at least one dimension greater than 4 inches.

Unfortunately, we have failed to ensure that our existing data meets our business rule. This query will not return the existing row, even though its largest dimension is 5 inches.

As usual, we can either eliminate our assumption, which will involve writing a more complex query that does not rely on it, or we can clean up our data, assume that it will stay clean, and leave our query alone. Unfortunately, the assumption that the data will "stay clean" is a dangerous one, when enforcing data integrity rules in the application. Our application may have bugs and, in some cases, may fail to enforce the rule. Some clients may continue to run the old version of the application, which does not enforce the new business rule at all. Some data may be loaded by means other than the application, such as through SSMS, therefore bypassing the rule enforcement altogether. All too many developers completely overlook these possibilities, assuming that enforcing business rules only in the application is safe. In reality, data integrity logic housed in the application layer is frequently bypassed.

As a result, it is quite possible that we will have data in the Boxes table that does not meet our business rule, and that we're likely to have to repeat any "data clean up" process many times. Some shops run such data clean-ups weekly or even daily. In short, although we can use our applications to enforce our data integrity rules, and although this may seem to be the fastest way to get things done in the short term, it is an approach that is inefficient in the long run.

Most of the arguments covered here may also apply to enforcing data integrity logic in stored procedures, unless you are able to implement a design whereby access of your stored procedure layer is enforced, by forbidding all direct table access.

Over the following sections, we'll discuss how to use constraints and triggers, which are usually the preferred ways to protect data integrity.

Others