2. Indexing Basics
2.1 The B-Tree Index
The two main types of indexes in SQL
Server are clustered and nonclustered indexes. Each index type is
implemented via a balanced-tree (B-tree) data structure. A B-tree is a
structure that stores data in a sorted order and enables fast access to
the data it holds.
B-tree indexes exist on index pages and have a
root level, one or more intermediate levels, and a leaf level. When you
define an index, you specify one or more key columns. These columns are
actually sorted in the index, as defined in Figure 1.
The difference between clustered and nonclustered indexes is the way in
which the data is stored at the leaf level of the index.
Over time, indexes will likely become
fragmented, which can potentially impact performance.
2.2 Clustered Indexes
When a clustered index is created on a
table, the index itself becomes the table. This is somewhat confusing
at first. When you create a clustered index, under the covers the
database engine sorts the data in the underlying table based on the
index key(s) you define and stores the table in that order. The
clustered index doesn't become a separate structure for the underlying
table data like a nonclustered index does; the clustered index is
the table. All the data for the table is stored in the leaf pages of
the clustered index. Because you can sort a table only in one physical
sorted order, you can have only one clustered index per table.
Note
The keys defined for a clustered index
must always be unique. When you define a clustered index without using
the UNIQUE keyword, a “uniquifier” is added to the clustered key to
ensure the set of values is unique. This is an integer value that makes
your key larger. Because the clustered key is always stored in
nonclustered indexes, it makes these indexes larger as well. Keep this
in mind when defining your clustered index keys.
Logically, the clustered index pages maintain the
data in the clustered index sort order. However, physically those pages
are connected via a linked list — each page links to the next page and
previous page in the list, so it is not guaranteed that the data
remains in a sorted physical order. Also, it is not guaranteed that the
rows on a page are stored in sorted order. An offset array is used at
the end of the page to indicate where in the page rows begin and end in
a sorted fashion. In a perfect world the pages would be in the same
order as the list, but in reality they are often moved around due to
page splits and fragmentation .
A great example to illustrate a clustered index
is a telephone book. A telephone book is always in sorted order, based
on the last name of the individual followed by the first name. The
sorted order makes it easy to find the phone number of the person
you're looking for.
Assume you want to find my telephone number in
the book. To do this, you'd base your search primarily on my last name,
Chapman. You'd open the phone book approximately in the middle to gauge
how close you were to the “Chapman” section. You'd then pick the side
that contained C and split it again. You'd continue this operation
until you find the page that contains the last name Chapman, and it
would take you only a few operations to do this. When you find my last
name, you could then perform a search for my first name (Tim) within
the Chapman result set. When you find my full name, you'd then have
access to my address information without needing to perform any
additional searches. What you just performed to find my last name was a
type of binary search operation. You split the problem in halves until
you reached the solution.
This is similar to an index search operation.
When searching for a value, you begin at the root page (approximately
the “middle” of the values you're searching for) and continue down
intermediate paths, choosing the proper path for the value you're
searching for. Eventually you make it to the leaf level of the index,
where you find the value you're looking for. For clustered indexes,
after you make it to the leaf level, all the data you need is contained
there, so there is no need to perform any additional searching on the
data. All the data in the table is contained at the leaf level of the
index; this means that the clustered index is the table.
2.3 Nonclustered Indexes
SQL Server nonclustered indexes are
also implemented as a B-tree data structure. The difference between a
clustered index and a nonclustered index is that the leaf level pages
of the nonclustered index do not contain all the base table data like
the clustered index does. Instead, the leaf level of the nonclustered
index contains the index keys along with a pointer to the base table.
If the nonclustered index is not unique, all levels of the index
contain a pointer to the base table. If the base table is a clustered
index, the clustered keys are stored in the nonclustered index. If the
base table is a heap, the nonclustered index contains the
row-identifier for the base table record.
For example, the nonclustered index shown in Figure 2
uses the first name column as its key column so that's the data sorted
by the index. The nonclustered index points to the base table by
including the clustered index key column. In Figure 2, the clustered index key column is the identity column used in Figure 1.
This is an important fact to consider when
designing your nonclustered indexes. If your clustered key is large,
such as a uniqueidentifier, that large data type will be included in
the nonclustered index key for each row. So, a large clustered key can
directly affect the size of your nonclustered indexes.
Since SQL Server 2005, additional unsorted columns can be included
in the leaf level. The employee's title and department columns could be
added to the previous index, which is extremely useful in designing
covering indexes.
A SQL Server table may have up to 999
nonclustered indexes, but I've never seen a well-normalized table that
required more than a dozen well-designed indexes.
2.4 Composite Indexes
A composite index is a clustered
or nonclustered index that is defined on multiple columns. Consider the
clustered index telephone book example; the keys of the composite index
were based on the last name and first name of the individual, in that
order. The ordering of the columns in a composite index is important.
For a search to take advantage of a composite index, it must include
the index columns from left to right. If the composite index is lastname, firstname, a search for only the firstname
cannot seek the first name because it must first know the last name.
The first name is not found independently in sorted order but is
instead based on the lastname column. If you need to perform a search
for lastname, or lastname and firstname, you can efficiently use the index.