4. Problems with UDFs wrapped in CHECK constraints
Some complex business rules are difficult or
impossible to implement via regular constraints. In such cases, it seems
intuitive to develop a scalar UDF and wrap it in a CHECK constraint.
For example, suppose that we need to enforce a data integrity rule that states:
We can have any number of NULLs in the Barcode column, but the NOT NULL values must be unique.
Clearly, we cannot use a UNIQUE index or constraint in this situation, because it would only allow a single NULL value, and we need to support multiple NULL values in this column.
The behavior of UNIQUE in SQL Server is not ANSI standard
ANSI specifies that a UNIQUE constraint should enforce uniqueness for non-NULL values only. Microsoft's implementation of the UNIQUE constraint deviates from this standard definition.
|
To set up the example, we just need to add a Barcode column to our Items table, as shown in Listing 24.
To enforce our business rule, it is technically possible to develop a scalar UDF and invoke it from a CHECK constraint, as demonstrated in Listing 25.
This solution looks intuitive and it works fine for INSERTs. Listing 26 verifies that we can INSERT more than one NULL barcode.
Listing 27 verifies that we can INSERT items with NOT NULL barcodes, as long as we do not INSERT duplicates.
Finally, Listing 28 verifies that we cannot INSERT a duplicate NOT NULL barcode.
So, as long as we only insert rows, the CHECK constraint UNQ_Items_Barcode works. Similarly, we can test it for a single-row UPDATE. The constraint allows a single-row UPDATE if there is no collision, as shown in Listing 29.
Finally, Listing 30 shows that the constraint prevents a single-row UPDATE if it would result in a collision, as expected.
Apparently our CHECK constraint meets our requirements, correct? Not exactly. Unfortunately, the CHECK constraint may prevent a perfectly valid UPDATE, if that UPDATE modifies more than one row at a time.
In fact, this technique has the following three problems:
such constraints may produce false negatives; they may prohibit a valid update
such constraints may produce false positives; they may allow an invalid modification
such constraints are very slow.
False negatives: failure during multi-row updates
A valid UPDATE can fail to validate against a scalar UDF wrapped in a CHECK constraint. To demonstrate this, we'll attempt to swap two NOT NULL barcodes that are already saved into our table and are clearly unique, as shown in Listing 6-34. Unfortunately, somehow, the UPDATE fails with exactly the same error message as we saw in Listing 31.
Let us verify that this UPDATE does not result in a collision. To accomplish that, we'll have to disable the constraint so that the UPDATE can complete, as shown in Listing 32.
Listing 33 verifies that we do not have duplicate NOT NULL barcodes.
We can re-enable the constraint and make sure that it is trusted, as shown in Listing 34.
Clearly, the CHECK constraint recognizes that, after the UPDATE, all the data in Items table is valid; otherwise the ALTER TABLE command would have failed and the constraint would not be trusted
So, why did the constraint prevent a perfectly correct UPDATE from completing? The reason, I believe, is as follows: CHECK constraints evaluate earlier than other types of constraint. As soon as a single row is modified, the CHECK constraint, UNQ_Items_Barcode,
verifies that the modified row is valid. This verification occurs
before other rows are modified. In this particular case, two rows need
to be modified. We do not know which row is modified first but suppose,
for the sake of argument, that it is the row with barcode 12345679. When
this row is modified, the new barcode for that row is 12345678.
Immediately, the CHECK constraint, UNQ_Items_Barcode, invokes the scalar UDF, dbo.GetBarcodeCount, which returns 2, because there is another, as yet unmodified row with the same barcode, 12345678.
NOTE
In this particular case
we are discussing an update that touches a very small table and
modifies only two rows. As such, we are not considering the possibility
that this update will execute on several processors in parallel.
As a result, our CHECK constraint provides a
false negative; it erroneously prohibited a perfectly valid multi-row
update. Note that the behavior described here is arguably a bug in SQL
Server. As such, it could be fixed in future versions of SQL Server.
False positives: allowing an invalid modification
With this technique, a more common problem than the
false negative is the false positive, i.e. allowing an invalid
modification. This problem occurs because people forget that CHECK
constraints only fire if the columns they protect are modified. To
demonstrate this, we need to change the implementation of our scalar UDF
and rewrap it in a CHECK constraint, as shown in Listing 35. Before the change, the UDF took Barcode as a parameter; now it takes ItemLabel.
This new implementation looks equivalent to the previous one. To test it, simply rerun Listings 26 (including the initial DELETE), 27, and 28; they should all work exactly as before. However, this new constraint allows an UPDATE that results in a duplicate barcode.
What happened? Why did the constraint not prevent the
duplicate? If we fire up Profiler, and set it to track individual
statements, we can see that the UDF was not executed at all. From the
optimizer's point of view, this makes perfect sense: apparently this CHECK constraint only uses ItemLabel, so there is no point invoking the constraint if ItemLabel has not been changed.
Note that, as usual, there is no guarantee that your optimizer will make the same choice as mine did. This means that Listings 37 and 38 may, or may not, work on your server exactly as they worked on mine.
Listing 37 tricks the optimizer into thinking that ItemLabel has been changed. This time, the CHECK constraint is invoked and prevents a duplicate.
As we have seen, UDFs wrapped in CHECK
constraints can give us both false positives and false negatives.
Fortunately, there are safer and better approaches, described in the
following two sections.
The unique filtered index alternative (SQL Server 2008 only)
In SQL Server 2008, a filtered index is a perfect solution for this problem. Listing 38 drops our CHECK constraint and replaces it with a filtered index.
To verify that the filtered index works, we can empty the Items table and rerun all the steps which we took to test our CHECK constraint, which is all the scripts from Listing 29 to Listing 33.
We can also rerun the scenarios where we were getting false positives
and false negatives, and verify that our unique filtered index works as
expected.
Before moving on, drop the filtered index, so that it does not interfere with the forthcoming examples.
The indexed view alternative
Prior to SQL Server 2008, we cannot use filtered indexes, but we can use an indexed view to accomplish the same goal.