SQL Server 2012 : Indexing Basics (part 1) - The B-Tree Index, Clustered Indexes, Nonclustered Indexes

11/6/2013 8:24:40 PM

1. Zen and the Art of Indexing

Indexing and performance tuning are a mixture of art and science. The science portion includes knowing the details of how indexing works under the covers and when it makes sense to define indexes on the appropriate columns on a table. The art portion includes knowing when to stray away from what conventional wisdom tells you to do and designing a proper indexing strategy for what works best for your environment. There is no secret formula to define the correct indexes. An indexing strategy that works great for one system may perform poorly on another system.

Another aspect is that an indexing strategy is ever-evolving. Design and implement indexes for what your system needs at the present time. As your system evolves, so too must your indexes. Be careful defining indexes that your system does not currently use but may use later. This type of strategy often ends up in indexes not being used, but having to be constantly maintained.

What's New with Indexes?

New in SQL Server 2012 is the FORCESCAN query hint. Use this hint to ensure that a SCAN access method is used to retrieve rows from a table or an index.

One nugget of knowledge to take away is that a SCAN operation is not inherently evil. You may have heard colleagues talk about how evil scans are and how you should have only SEEK operations in query plans. Seek operations, although desireable for OLTP environments, are not without their perils. A seek operation coupled with a Bookmark Lookup operation can quickly become much more expensive than a SCAN operation. Always use query hints with caution.

2. Indexing Basics

2.1 The B-Tree Index

The two main types of indexes in SQL Server are clustered and nonclustered indexes. Each index type is implemented via a balanced-tree (B-tree) data structure. A B-tree is a structure that stores data in a sorted order and enables fast access to the data it holds.

B-tree indexes exist on index pages and have a root level, one or more intermediate levels, and a leaf level. When you define an index, you specify one or more key columns. These columns are actually sorted in the index, as defined in Figure 1. The difference between clustered and nonclustered indexes is the way in which the data is stored at the leaf level of the index.

Figure 1 This figure illustrates a simplified view of a clustered index with an identity column as the clustered index key. The first name is the data column.

Over time, indexes will likely become fragmented, which can potentially impact performance.

2.2 Clustered Indexes

When a clustered index is created on a table, the index itself becomes the table. This is somewhat confusing at first. When you create a clustered index, under the covers the database engine sorts the data in the underlying table based on the index key(s) you define and stores the table in that order. The clustered index doesn't become a separate structure for the underlying table data like a nonclustered index does; the clustered index is the table. All the data for the table is stored in the leaf pages of the clustered index. Because you can sort a table only in one physical sorted order, you can have only one clustered index per table.

Note

The keys defined for a clustered index must always be unique. When you define a clustered index without using the UNIQUE keyword, a “uniquifier” is added to the clustered key to ensure the set of values is unique. This is an integer value that makes your key larger. Because the clustered key is always stored in nonclustered indexes, it makes these indexes larger as well. Keep this in mind when defining your clustered index keys.

Logically, the clustered index pages maintain the data in the clustered index sort order. However, physically those pages are connected via a linked list — each page links to the next page and previous page in the list, so it is not guaranteed that the data remains in a sorted physical order. Also, it is not guaranteed that the rows on a page are stored in sorted order. An offset array is used at the end of the page to indicate where in the page rows begin and end in a sorted fashion. In a perfect world the pages would be in the same order as the list, but in reality they are often moved around due to page splits and fragmentation .

A great example to illustrate a clustered index is a telephone book. A telephone book is always in sorted order, based on the last name of the individual followed by the first name. The sorted order makes it easy to find the phone number of the person you're looking for.

Assume you want to find my telephone number in the book. To do this, you'd base your search primarily on my last name, Chapman. You'd open the phone book approximately in the middle to gauge how close you were to the “Chapman” section. You'd then pick the side that contained C and split it again. You'd continue this operation until you find the page that contains the last name Chapman, and it would take you only a few operations to do this. When you find my last name, you could then perform a search for my first name (Tim) within the Chapman result set. When you find my full name, you'd then have access to my address information without needing to perform any additional searches. What you just performed to find my last name was a type of binary search operation. You split the problem in halves until you reached the solution.

This is similar to an index search operation. When searching for a value, you begin at the root page (approximately the “middle” of the values you're searching for) and continue down intermediate paths, choosing the proper path for the value you're searching for. Eventually you make it to the leaf level of the index, where you find the value you're looking for. For clustered indexes, after you make it to the leaf level, all the data you need is contained there, so there is no need to perform any additional searching on the data. All the data in the table is contained at the leaf level of the index; this means that the clustered index is the table.

2.3 Nonclustered Indexes

SQL Server nonclustered indexes are also implemented as a B-tree data structure. The difference between a clustered index and a nonclustered index is that the leaf level pages of the nonclustered index do not contain all the base table data like the clustered index does. Instead, the leaf level of the nonclustered index contains the index keys along with a pointer to the base table. If the nonclustered index is not unique, all levels of the index contain a pointer to the base table. If the base table is a clustered index, the clustered keys are stored in the nonclustered index. If the base table is a heap, the nonclustered index contains the row-identifier for the base table record.

For example, the nonclustered index shown in Figure 2 uses the first name column as its key column so that's the data sorted by the index. The nonclustered index points to the base table by including the clustered index key column. In Figure 2, the clustered index key column is the identity column used in Figure 1.

Figure 2 This simplified illustration of a nonclustered index has a B-tree index with a first name as the key column. The nonclustered index includes pointers to the clustered index key column.

This is an important fact to consider when designing your nonclustered indexes. If your clustered key is large, such as a uniqueidentifier, that large data type will be included in the nonclustered index key for each row. So, a large clustered key can directly affect the size of your nonclustered indexes.

Since SQL Server 2005, additional unsorted columns can be included in the leaf level. The employee's title and department columns could be added to the previous index, which is extremely useful in designing covering indexes.

A SQL Server table may have up to 999 nonclustered indexes, but I've never seen a well-normalized table that required more than a dozen well-designed indexes.

2.4 Composite Indexes

A composite index is a clustered or nonclustered index that is defined on multiple columns. Consider the clustered index telephone book example; the keys of the composite index were based on the last name and first name of the individual, in that order. The ordering of the columns in a composite index is important. For a search to take advantage of a composite index, it must include the index columns from left to right. If the composite index is lastname, firstname, a search for only the firstname cannot seek the first name because it must first know the last name. The first name is not found independently in sorted order but is instead based on the lastname column. If you need to perform a search for lastname, or lastname and firstname, you can efficiently use the index.

Others

- Windows 7 : Using Internet Explorer 8 - Using Multimedia Browsing and Downloading (part 3)

- Windows 7 : Using Internet Explorer 8 - Using Multimedia Browsing and Downloading (part 2)

- Windows 7 : Using Internet Explorer 8 - Using Multimedia Browsing and Downloading (part 1)

- Windows Server 2012 : Highly available, easy-to-manage multi-server platform - Management efficiency (part 3) - PowerShell 3.0

- Windows Server 2012 : Highly available, easy-to-manage multi-server platform - Management efficiency (part 2) - Simplified Active Directory administration

- Windows Server 2012 : Highly available, easy-to-manage multi-server platform - Management efficiency (part 1) - The new Server Manager

- Windows Server 2012 : Highly available, easy-to-manage multi-server platform - Cost efficiency - Storage Spaces

- System Center Configuration Manager 2007 : Client Management - Client Discovery (part 2) - Network Discovery

- System Center Configuration Manager 2007 : Client Management - Client Discovery (part 1) - Active Directory User Discovery

- Windows Vista : Providing User Data Protection (part 3) - Putting data protection in place - Enabling roaming profiles, Enabling folder redirection