SQL Server 2008 R2 : Optimistic Locking

1/15/2013 6:40:48 PM

With many applications, clients need to fetch the data to browse through it, make modifications to one or more rows, and then post the changes back to the database in SQL Server. These human-speed operations are slow in comparison to machine-speed operations, and the time lag between the fetch and post might be significant. (Consider a user who goes to lunch after retrieving the data.)

For these applications, you would not want to use normal locking schemes such as SERIALIZABLE or HOLDLOCK to lock the data so it can’t be changed from the time the user retrieves it to the time he or she applies any updates. This would violate one of the key rules for minimizing locking contention and deadlocks that you should not allow user interaction within transactions. You would also lose all control over the duration of the transaction. In a multiuser OLTP environment, the indefinite holding of the shared locks could significantly affect concurrency and overall application performance due to blocking on locks and locking contention.

On the other hand, if the locks are not held on the rows being read, another process could update a row between the time it was initially read and when the update is posted. When the first process applies the update, it would overwrite the changes made by the other process, resulting in a lost update.

So how do you implement such an application? How do you allow users to retrieve information without holding locks on the data and still ensure that lost updates do not occur?

Optimistic locking is a technique used in situations in which reading and modifying data processes are widely separated in time. Optimistic locking helps a client avoid overwriting another client’s changes to a row without holding locks in the database.

One approach for implementing optimistic locking is to use the rowversion data type. Another approach is to take advantage of the optimistic concurrency features of snapshot isolation.

Optimistic Locking Using the rowversion Data Type

SQL Server 2008 provides a special data type called rowversion that can be used for optimistic locking purposes within applications. The purpose of the rowversion data type is to serve as a version number in optimistic locking schemes. SQL Server automatically generates the value for a rowversion column whenever a row that contains a column of this type is inserted or updated. The rowversion data type is an 8-byte binary data type, and other than guaranteeing that the value is unique and monotonically increasing, the value is not meaningful; you cannot look at the individual bytes and make any sense of them.

Note

In previous versions of SQL Server, the rowversion data type was also referred to as the timestamp data type. While this data type synonym still exists in SQL Server 2008, it has been deprecated and the rowversion data type name should be used instead to ensure future compatibility.

In an application that uses optimistic locking, the client reads one or more records from the table, being sure to retrieve the primary key and current value of the rowversion column for each row, along with any other desired data columns. Because the query is not run within a transaction, any locks acquired for the SELECT are released after the data has been read. At some later time, when the client wants to update a row, it must ensure that no other client has changed the same row in the intervening time. The UPDATE statement must include a WHERE clause that compares the rowversion value retrieved with the original query, with the current rowversion value for the record in the database. If the rowversion values match—that is, if the value that was read is the same as the value currently in the database—no changes to that row have occurred since it was originally retrieved. Therefore, the change attempted by the application can proceed. If the rowversion value in the client application does not match the value in the database, that particular row has been changed since the original retrieval of the record. As a result, the state of the row that the application is attempting to modify is not the same as the row that currently exists in the database. As a result, the transaction should not be allowed to take place, to avoid the lost update problem.

To ensure that the client application does not overwrite the changes made by another process, the client needs to prepare the T-SQL UPDATE statement in a special way, using the rowversion column as a versioning marker. The following pseudocode represents the general structure of such an update:

UPDATE theTable
  SET theChangedColumns = theirNewValues
  WHERE primaryKeyColumns = theirOldValues
    AND rowversion = itsOldValue

Because the WHERE clause includes the primary key, the UPDATE can apply only to exactly one row or to no rows; it cannot apply to more than one row because the primary key is unique. The second part of the WHERE clause provides the optimistic “locking.” If another client has updated the row, the rowversion no longer has its old value (remember that the server changes the rowversion value automatically with each update), and the WHERE clause does not match any rows. The client needs to check whether any rows were updated. If the number of rows affected by the update statement is zero, the row has been modified since it was originally retrieved. The application can then choose to reread the data or do whatever recovery it deems appropriate. This approach has one problem: how does the application know whether it didn’t match the row because the rowversion was changed, because the primary key had changed, or because the row had been deleted altogether?

In SQL Server 2000, there was an undocumented tsequal() function (which was documented in prior releases) that could be used in a WHERE clause to compare the rowversion value retrieved by the client application with the rowversion value in the database. If the rowversion values matched, the update would proceed. If not, the update would fail, with error message 532, to indicate that the row had been modified. Unfortunately, this function is no longer provided in SQL Server 2005 and later releases. Any attempt to use it now results in a syntax error. As an alternative, you can programmatically check whether the update modified any rows, and if not, you can check whether the row still exists and return the appropriate message. Listing 1 provides an example of a stored procedure that implements this strategy.

Listing 1. An Example of a Procedure for Optimistic Locking

create proc optimistic_update
      @id int, -- provide the primary key for the record
      @data_field_1 varchar(10), -- provide the data value to be updated
      @rowversion rowversion -- pass in the rowversion value retrieved with
                           -- the initial data retrieval
as
-- Attempt to modify the record
update data_table
  set data_field_1 = @data_field_1
  where id = @id
    and versioncol = @rowversion
-- Check to see if no rows updated
IF @@ROWCOUNT=0
BEGIN
  if exists (SELECT * FROM data_table WHERE id=@id)
  -- The row exists but the rowversions don't match
  begin
     raiserror ('The row with id "%d" has been updated since it was read',
                 10, 1, @id)
     return -101
  end
  else -- the row has been deleted
  begin
     raiserror ('The row with id "%d" has been deleted since it was read',
                  10, 2, @id)
     return -102
  end
end
ELSE
  PRINT 'Data Updated'
return 0

Using this approach, if the update doesn’t modify any rows, the application receives an error message and knows for sure that the reason the update didn’t take place is that either the rowversion value didn’t match or the row was deleted. If the row is found and the rowversion values match, the update proceeds normally.

Optimistic Locking with Snapshot Isolation

SQL Server 2008’s Snapshot Isolation mode provides another mechanism for implementing optimistic locking through its automatic row versioning. If a process reads data within a transaction when Snapshot Isolation mode is enabled, no locks are acquired or held on the current version of the data row. The process reads the version of the data at the time of the query. Because no locks are held, it doesn’t lead to blocking, and another process can modify the data after it has been read. If another process does modify a data row read by the first process, a new version of the row is generated. If the original process then attempts to update that data row, SQL Server automatically prevents the lost update problem by checking the row version. In this case, because the row version is different, SQL Server prevents the original process from modifying the data row. When it attempts to modify the data row, the following error message appears:

Msg 3960, Level 16, State 4, Line 2
Snapshot isolation transaction aborted due to update conflict. You cannot use
 snapshot isolation to access table 'dbo.data_table' directly or indirectly in
 database 'bigpubs2008' to update, delete, or insert the row that has been modified
 or deleted by another transaction. Retry the transaction or change the isolation
 level for the update/delete statement.

To see how this works, you can create the following table:

use bigpubs2008
go
--The first statement is used to disable any previously created
--DDL triggers in the database which would prevent creating a new table.
DISABLE TRIGGER ALL ON DATABASE
go
create table data_table
   (id int identity,
    data_field_1 varchar(10),
    timestamp timestamp)
go
insert data_table (data_field_1) values ('foo')
go

Next, you need to ensure that bigpubs2008 is configured to allow snapshot isolation:

ALTER DATABASE bigpubs2008 SET ALLOW_SNAPSHOT_ISOLATION ON

In one user session, you execute the following SQL statements:

SET TRANSACTION ISOLATION LEVEL SNAPSHOT
go
begin tran
select * from data_table
go

id         data_field_1 timestamp
---------- ------------ ----------------
1          foo         0x0000000000000BC4

Now, in another user session, you execute the following UPDATE statement:

update data_table set data_field_1 = 'bar'
  where id = 1

Then you go back to the original session and attempt the following update:

update data_table set data_field_1 = 'fubar'
  where id = 1
go

Msg 3960, Level 16, State 4, Line 2
Snapshot isolation transaction aborted due to update conflict. You cannot use
 snapshot isolation to access table 'dbo.data_table' directly or indirectly in
 database 'bigpubs2008' to update, delete, or insert the row that has been modified
 or deleted by another transaction. Retry the transaction or change the isolation
 level for the update/delete statement.

Note that for the first process to hold on to the row version, the SELECT and UPDATE statements must be run in the same transaction. When the transaction is committed or rolled back, the row version acquired by the SELECT statement is released. However, because the SELECT statement run at the Snapshot Isolation level does not hold any locks, there are no locks being acquired or held by that SELECT statement within the transaction, so it avoids the problems that would normally be encountered by using HOLDLOCK or the Serializable Read isolation level. Because no locks were held on the data row, the other process was allowed to update the row after it was retrieved, generating a new version of the row. The automatic row versioning provided by SQL Server’s Snapshot Isolation mode prevented the first process from overwriting the update performed by the second process, thereby preventing a lost update.

Caution

Locking contention is prevented in the preceding example only because the transaction performed only a SELECT before attempting the UPDATE. A SELECT run with Snapshot Isolation mode enabled reads the current version of the row and does not acquire or hold locks on the actual data row. However, if the process were to perform any other modification on the data row, the update or exclusive locks acquired would be held until the end of the transaction, which could lead to locking contention, especially if user interaction is allowed within the transaction after the update or exclusive locks are acquired.

Because of the overhead incurred by snapshot isolation and the cost of having to roll back update conflicts, you should consider using Snapshot Isolation mode only to provide optimistic locking for systems where there is little concurrent updating of the same resource so that it is unlikely that your transactions have to be rolled back because of an update conflict.

Others