SQL Server : Common Problems with Data Integrity - Enforcing Data Integrity Using Triggers (part 1) - Problems with multi-row modifications

6/11/2013 7:36:24 PM

Constraints are robust but, as we've discussed, they are often not suitable for implementing more complex data integrity rules. When such requirements arise, many developers turn to triggers. Triggers allow a lot of flexibility; we can tuck pretty much any code into the body of a trigger. Also, in most cases (though not all, as we will see) triggers automatically fire when we modify data.

However, triggers do have limitations with regard to what functionality can be achieved, and are also hard to code, and therefore prone to weaknesses. As such, they are the cause of many common data integrity issues. Some of the typical data integrity problems related to triggers are as follows:

some triggers falsely assume that only one row at a time is inserted/updated/deleted
some triggers falsely assume that the primary key columns will never be modified
under some circumstances, triggers do not fire
triggers may undo changes made by other triggers
some triggers do not work under snapshot isolation levels.

Some of these problems can be fixed by improving the triggers. However, not all of these problems mean that the trigger was poorly coded – some are inherent limitations of triggers in general. For example, in some cases the database engine does not fire a trigger, and there is nothing we can change in the trigger to fix that problem.

We'll discuss each of these problems in detail over the coming sections.

Problems with multi-row modifications

In the following example, our goal is to record in a "change log" table any updates made to an item's Barcode. Listing 1 creates the change log table, ItemBarcodeChangeLog. Note that there is no FOREIGN KEY on purpose, because the change log has to be kept even after an item has been removed.

Listing 1. Creating a table to log changes in the Barcode column of the Items table.

The FOR UPDATE trigger shown in Listing 2 is designed to populate the ItemBarcodeChangeLog table, whenever a barcode is updated. When an UPDATE statement runs against the Items table, the trigger reads the Barcode value as it existed before the update, from the deleted virtual table, and stores it in a variable. It then reads the post-update Barcode value from the inserted virtual table and compares the two values. If the values are different, it logs the change in ItemBarcodeChangeLog. I have added a lot of debugging output, to make it easier to understand how it works.

Listing 2. The Items_LogBarcodeChange trigger logs changes made to the Barcode column of the Items table.

Listing 3 demonstrates how this trigger works when we perform a single-row update.

Listing 3. One row is modified and our trigger logs the change.

Our trigger works for single-row updates, but how does it handle multi-row updates? Listing 4 empties the change log table, adds one more row to the Items table, updates two rows in the Items table, and then interrogates the log table, dbo.ItemBarcodeChangeLog, to see what has been saved.

Listing 4. Trigger fails to record all changes when two rows are updated.

Our trigger does not handle the multi-row update properly; it silently inserts only one row into the log table. Note that I say "inserts only one row," rather than "logs only one change." The difference is important: if we modify two or more rows, there is no guarantee that our trigger will record the OldBarcode and NewBarcode values associated with a single modified row. When we update more than one row, both the inserted and deleted virtual tables have more than one row, as shown by the debugging output in Listing 4.

The SELECT that populates the OldBarcode variable in our trigger will randomly pick one of the two values, 123457 or 234567, listed in the "debugging output: data before update" section. The SELECT that populates NewBarcode works in the same way; it can choose either 1234579 or 2345679. In this case, it happens that the OldBarcode and NewBarcode do come from one and the same modified row, and so the net effect is that the trigger appears to log only one of the updates, albeit correctly. In fact, this was just chance; it could equally well have taken the OldBarcode from one row and the NewBarcode from the other, the net effect being an erroneous, single log record.

In short, this logic used in this trigger does not work for multi-row updates; it contains a "hidden" assumption that only one row at a time will be updated. We cannot easily get rid of that incorrect assumption; in fact, since enforcing the assumption does not seem feasible in this situation, we need to rewrite the trigger from scratch in order to remove it, as shown in Listing 5. This time, rather than store the old and new values in variables, we use the inserted and deleted virtual tables directly, and then populate the change log table via a set-based query that joins those virtual tables, and correctly handles multi-row updates.

Listing 5. Altering our trigger so that it properly handles multi-row updates.

Rerunning Listing 4 verifies that our altered trigger now handles multi-row updates.

Listing 6. Our altered trigger properly handles multi-row updates.

The first lesson here is that, when developing triggers, the defensive programmer should always use proper set-based logic, rather than iterative logic.

Others