1. OVERVIEW
Tommy Cooper, the late great comic
magician, did a trick in which he put two handkerchiefs, one white and
one blue, into a bag. He said a magic word, pulled them out again, and
then stated that the white one had turned blue, and the blue one had
turned white. It’s an excellent trick, though perhaps misunderstood,
because the audience gets the impression that no change has occurred at
all, and that he is simply pretending that the colors have swapped.
All joking aside, when you put something into a
database, you have a certain level of expectation. You want to be
assured that any data that has been entered can be retrieved in the
same state, notwithstanding another process coming along and explicitly
changing or deleting it. You don’t want any magic to wreak havoc while
you’re looking the other way. In short, you want your transaction to be
protected.
Having become so accustomed to the way that a database
works, various things are now simply expected, just as you expect a
letter to appear when you press a key on your computer keyboard,
oblivious to the complex programming by software developers that makes
it possible. When writing programs using very low-level languages,
developers still need to consider those types of things, but for all
the other developers, there is a lot that can be taken for granted.
Nonetheless, the concepts used to
protect your data should be understood. After all, you need to allow
many processes to access your databases at once, and therefore need to
appreciate the difference between having some “magic” occur that has
unexpected results, and controlling the behavior that occurs when
multiple processes want to act on the same pieces of data. Nothing
should give a database user the impression of magic, and the power of
concurrency — coordinating multiple processes — should be appreciated
and leveraged.
2. TRANSACTIONS
Just to ensure that we’re all on the
same page, let’s quickly review what we’re talking about when we
discuss transactions. The most common analogy used to understand
database transactions is the bank transaction. Beginning with the
deposit, suppose you take $50 to the counter, resulting in a credit
transaction in that amount to your account. When you look at your
account statement when it arrives, you expect the transaction record to
reflect that you deposited $50, not $48 or $52, depending on any fees
or charges that might apply. This expectation actually stems from four
aspects of transactions that have been identified by experts and that
should be protected: atomicity, consistency, isolation, and durability,
which form the neat acronym ACID. The following sections first examine
these in the context of the bank transaction, and then you will revisit
them in the context of your database.
A Is for Atomic
Atomic means indivisible — in this
case, a collection of events being treated as a single unit. When you
take your money to the bank and deposit it, you expect the transaction
to be completed successfully. That is, you don’t expect the teller to
accept your money and then go to lunch, forgetting to credit your
account. That kind of behavior would obviously ruin a bank; and when we
revisit atomicity in the context of the database, you’ll see that it
would also ruin a database.
C Is for Consistent
Consistent means that everything is in
agreement — in this case, the amount deposited is the amount credited.
If you access a list of your recent transactions, the $50 that you
deposited on Monday must be recorded as $50 on Monday, not $48 on
Monday, not $52 on Tuesday, or any other combination of incorrect data.
In other words, it is imperative that your records match the bank’s
records. Although you may feel personally slighted or ignored at the
bank, or the teller may not remember you between visits, you need to
feel confident that the bank can successfully process your transactions
such that they are completed in a consistent manner.
I Is for Isolated
Banks understand discretion. If you are
going through your dealings with a teller, you don’t expect someone to
be listening to the conversation and potentially making decisions based
on what’s going on. Isolation is the protection provided around the
visibility of what’s going on during each stage of the transaction, and
extends out to whether your transaction can be affected by anything
else that might be going on at the same time. Importantly, there are
different levels of isolation that can be chosen.
For example, if your spouse is in another branch
making a separate transaction, you might be okay with that branch
seeing some information about your transaction part way through it, but
you almost certainly wouldn’t want to see a bank statement issued that
only gave half the story.
D Is for Durable
Durability reflects the fact
that your bank transaction cannot be accidentally deleted or otherwise
compromised. After you deposit your money and receive a receipt, you
are assured that your money is safe and available to you. Even in the
event of system failure, the record of the fact that you deposited
money should persist, no matter what happens next.
3. DATABASE TRANSACTIONS
Having looked at the ACID principles in
the context of a bank transaction in the preceding section, this
section examines how these four principles relate to your database
environment, which you need to protect with just as much care as the
bank affords to your monetary transactions.
Atomicity
When you make a change in the database
that involves multiple operations, such as modifying two separate
tables, if you have identified these operations as a single
transaction, then you expect an all-or-nothing result — that is, the
change is completely atomic. Recall from the bank analogy that
depositing $50 must result in an additional $50 in your account. If the
bank’s server freezes or the teller’s terminal stops working, then you
expect your personal data to remain unchanged. In a database, locks
help to achieve this, by ensuring that a transaction has exclusive
access to anything that is being changed, so that it is either
committed or rolled back completely. Anything short of that would break
this very basic property of transactions.
Consistency
Databases enforce logic in many
different ways. When a change is attempted, it can’t be allowed to
occur until the system is satisfied that no rules are going to be
broken. For example, suppose you remove a value from a table but there
are foreign keys referring to that column. The system must verify that
these kinds of associations are handled before it can agree to that
change; but in order to perform those checks and potentially roll them
back if something has gone wrong, locks are needed. For another
example, it should be impossible to delete a row while something else
is being inserted in another table that relies on it.
Isolation
When the database engine inserts values
into a table, nothing else should be able to change those values at the
same time. Similarly, if the database engine needs to roll back to a
previous state, nothing else should have affected that state or left it
indeterminate. In other words, each action must happen in isolation
from all others.
Durability
Even if a failure occurs a split-second
after your transaction has taken place, you need to be sure that the
transaction has been persisted in the database. This is achieved
through one of the most significant aspects of SQL Server — the
behavior of the transaction log.
Most experienced database administrators have had
to salvage MDF files, where the databases’ data is stored, from a
failed server, only to find that the MDF files alone do not provide
enough information to recover the databases completely. Ideally, this
situation prompts the DBA to learn why, after which they understand
that MDF files without the accompanying LDF files (the transaction log)
do not reflect the whole story.
That’s because the transaction log is not like
many of the other logs on a Windows server, such as the Windows Event
Log. Those logs record information about what’s going on, but only in
order to provide a report of what has happened — typically for
troubleshooting purposes. The SQL Server transaction log is much more
than this.
When a transaction takes place, it is recorded in
the transaction log. Everything that the transaction is doing is
recorded there, while the changes to the actual data are occurring in
memory. Once the transaction is complete and a commit command is sent,
the changes are hardened, which is done in the transaction log.The data files are updated later. For the time
being, the change exists in memory (where processes can access the
updated data) and in the transaction log. Changes to the data files
happen shortly afterward, when a separate CHECKPOINT
operation takes place. Until then, the MDF files do not contain the
current version of the database — for that, the MDF and LDF files are
both needed.
Therefore, the durability of a transaction is
provided by the existence and preservation of the database’s
transaction log. Database administrators protect their transaction logs
above anything else; because in the event of a failure, the transaction
log is the only record of the latest database changes.
For a minimally logged operation, the
behavior is slightly different, and the transaction log contains only
sufficient information to be able to commit or rollback the transaction
fully; but the transaction log still performs a vital role in ensuring
that transactions are durable.