SQL Server : Enforcing Data Integrity in Constraints (part 4) - Problems with UDFs wrapped in CHECK constraints

5/25/2013 10:00:41 PM

4. Problems with UDFs wrapped in CHECK constraints

Some complex business rules are difficult or impossible to implement via regular constraints. In such cases, it seems intuitive to develop a scalar UDF and wrap it in a CHECK constraint.

For example, suppose that we need to enforce a data integrity rule that states:

We can have any number of NULLs in the Barcode column, but the NOT NULL values must be unique.

Clearly, we cannot use a UNIQUE index or constraint in this situation, because it would only allow a single NULL value, and we need to support multiple NULL values in this column.

The behavior of UNIQUE in SQL Server is not ANSI standard

ANSI specifies that a UNIQUE constraint should enforce uniqueness for non-NULL values only. Microsoft's implementation of the UNIQUE constraint deviates from this standard definition.

To set up the example, we just need to add a Barcode column to our Items table, as shown in Listing 24.

Listing 24. Adding a Barcode column to the Items table.

To enforce our business rule, it is technically possible to develop a scalar UDF and invoke it from a CHECK constraint, as demonstrated in Listing 25.

Listing 25. Creating GetBarcodeCount, a scalar UDF, and invoking it from a CHECK constraint.

This solution looks intuitive and it works fine for INSERTs. Listing 26 verifies that we can INSERT more than one NULL barcode.

Listing 26. The CHECK constraint UNQ_Items_Barcode allows us to insert more than one row with a NULL barcode.

Listing 27 verifies that we can INSERT items with NOT NULL barcodes, as long as we do not INSERT duplicates.

Listing 27. UNQ_Items_Barcode allows us to insert more rows with NOT NULL barcodes, as long as there are no duplicate barcodes.

Finally, Listing 28 verifies that we cannot INSERT a duplicate NOT NULL barcode.

Listing 28. UNQ_Items_Barcode prevents duplicate NOT NULL barcodes.

So, as long as we only insert rows, the CHECK constraint UNQ_Items_Barcode works. Similarly, we can test it for a single-row UPDATE. The constraint allows a single-row UPDATE if there is no collision, as shown in Listing 29.

Listing 29. The check constraint UNQ_Items_Barcode allows us to modify a NOT NULL barcode, as long as there is no collision.

Finally, Listing 30 shows that the constraint prevents a single-row UPDATE if it would result in a collision, as expected.

Listing 30. The check constraint UNQ_Items_Barcode does not allow modification of a NOT NULL barcode if it would result in a collision.

Apparently our CHECK constraint meets our requirements, correct? Not exactly. Unfortunately, the CHECK constraint may prevent a perfectly valid UPDATE, if that UPDATE modifies more than one row at a time.

In fact, this technique has the following three problems:

such constraints may produce false negatives; they may prohibit a valid update
such constraints may produce false positives; they may allow an invalid modification
such constraints are very slow.

False negatives: failure during multi-row updates

A valid UPDATE can fail to validate against a scalar UDF wrapped in a CHECK constraint. To demonstrate this, we'll attempt to swap two NOT NULL barcodes that are already saved into our table and are clearly unique, as shown in Listing 6-34. Unfortunately, somehow, the UPDATE fails with exactly the same error message as we saw in Listing 31.

Listing 31. The failed attempt to swap two unique Barcode items.

Let us verify that this UPDATE does not result in a collision. To accomplish that, we'll have to disable the constraint so that the UPDATE can complete, as shown in Listing 32.

Listing 32. Disabling the constraint UNQ_Items_Barcode so that the update completes.

Listing 33 verifies that we do not have duplicate NOT NULL barcodes.

Listing 33. Verifying that we do not have duplicate NOT NULL barcodes.

We can re-enable the constraint and make sure that it is trusted, as shown in Listing 34.

Listing 34. Re-enabling the constraint and making sure that it is trusted.

Clearly, the CHECK constraint recognizes that, after the UPDATE, all the data in Items table is valid; otherwise the ALTER TABLE command would have failed and the constraint would not be trusted

So, why did the constraint prevent a perfectly correct UPDATE from completing? The reason, I believe, is as follows: CHECK constraints evaluate earlier than other types of constraint. As soon as a single row is modified, the CHECK constraint, UNQ_Items_Barcode, verifies that the modified row is valid. This verification occurs before other rows are modified. In this particular case, two rows need to be modified. We do not know which row is modified first but suppose, for the sake of argument, that it is the row with barcode 12345679. When this row is modified, the new barcode for that row is 12345678. Immediately, the CHECK constraint, UNQ_Items_Barcode, invokes the scalar UDF, dbo.GetBarcodeCount, which returns 2, because there is another, as yet unmodified row with the same barcode, 12345678.

NOTE

In this particular case we are discussing an update that touches a very small table and modifies only two rows. As such, we are not considering the possibility that this update will execute on several processors in parallel.

As a result, our CHECK constraint provides a false negative; it erroneously prohibited a perfectly valid multi-row update. Note that the behavior described here is arguably a bug in SQL Server. As such, it could be fixed in future versions of SQL Server.

False positives: allowing an invalid modification

With this technique, a more common problem than the false negative is the false positive, i.e. allowing an invalid modification. This problem occurs because people forget that CHECK constraints only fire if the columns they protect are modified. To demonstrate this, we need to change the implementation of our scalar UDF and rewrap it in a CHECK constraint, as shown in Listing 35. Before the change, the UDF took Barcode as a parameter; now it takes ItemLabel.

Listing 35. Modifying the GetBarcodeCount scalar UDF and CHECK constraint.

This new implementation looks equivalent to the previous one. To test it, simply rerun Listings 26 (including the initial DELETE), 27, and 28; they should all work exactly as before. However, this new constraint allows an UPDATE that results in a duplicate barcode.

Listing 36. An invalid UPDATE succeeds, resulting in a duplicate barcode.

What happened? Why did the constraint not prevent the duplicate? If we fire up Profiler, and set it to track individual statements, we can see that the UDF was not executed at all. From the optimizer's point of view, this makes perfect sense: apparently this CHECK constraint only uses ItemLabel, so there is no point invoking the constraint if ItemLabel has not been changed.

Note that, as usual, there is no guarantee that your optimizer will make the same choice as mine did. This means that Listings 37 and 38 may, or may not, work on your server exactly as they worked on mine.

Listing 37 tricks the optimizer into thinking that ItemLabel has been changed. This time, the CHECK constraint is invoked and prevents a duplicate.

Listing 37. A slightly different update fails, as it should.

As we have seen, UDFs wrapped in CHECK constraints can give us both false positives and false negatives. Fortunately, there are safer and better approaches, described in the following two sections.

The unique filtered index alternative (SQL Server 2008 only)

In SQL Server 2008, a filtered index is a perfect solution for this problem. Listing 38 drops our CHECK constraint and replaces it with a filtered index.

Listing 38. Creating the UNQ_Items_Barcode filtered index.

To verify that the filtered index works, we can empty the Items table and rerun all the steps which we took to test our CHECK constraint, which is all the scripts from Listing 29 to Listing 33. We can also rerun the scenarios where we were getting false positives and false negatives, and verify that our unique filtered index works as expected.

Before moving on, drop the filtered index, so that it does not interfere with the forthcoming examples.

Listing 39. Dropping the filtered index.

The indexed view alternative

Prior to SQL Server 2008, we cannot use filtered indexes, but we can use an indexed view to accomplish the same goal.

Listing 40. Creating an indexed view.

Others