SQL Server 2012 : The Path of the Query (part 3) - Bookmark Lookup

11/6/2013 8:34:05 PM

Query Path 5 — Bookmark Lookup

This bookmark lookup query path is a two-edged sword. For queries returning a small number of records, it's an acceptable query path, but for the queries that return a significant amount of records , this query path can significantly hinder performance.

To demonstrate a bookmark lookup query path, the following query filters by ProductID while returning all the base table's columns:

SELECT *
 FROM Production.WorkOrder
 WHERE ProductID = 757;

To rephrase the query in pseudo-code, find the rows for Product 757 and give me all the columns for those rows.

There is an index on the ProductID column, so the query optimizer has two possible options:

Scan the entire clustered index to access all the columns, and then filter the results to find the right rows. Essentially, this would be the same as Query Path 4.

Or:

Perform an index seek on the IX_Workload_ProductID index to fetch the 11 rows. In the process it learns the WorkOrderID values for those 11 rows because the clustered index key columns are in the leaf level of the nonclustered index. Then it can index seek those 11 rows from the clustered index to fetch the other columns.

This jump, from the nonclustered index used to find the rows to the clustered index to complete the columns needed for the query, is called a bookmark lookup as shown in Figure 6.

Figure 6 The nonclustered index does not contain one or more columns that the query requests. To solve the query, SQL Server must perform a bookmark lookup (the dashed line) from the nonclustered index to the base table, which is a clustered index in this example. This illustration shows a single row. In reality it's often hundreds or thousands of rows scattered throughout the clustered index.

The real cost of the bookmark lookup is finding the rows that are typically scattered throughout the base table, which is a clustered index in this case. Locating the 11 rows in the nonclustered index was a single page hit, but those 11 rows might be on 11 different pages in the clustered index. With a larger number of selected rows, the problem intensifies. Selecting 1,000 rows with a bookmark lookup might mean reading three to four pages from the nonclustered index and then reading more than 1,000 pages from the clustered index and leaf level. Eventually, SQL Server decides that the bookmark lookup is more expensive than just scanning the clustered index.

The query execution plan for a bookmark lookup shows the two indexes as data sources for a nested loop join (as shown in Figure 7). For each row that comes from the seek of the nonclustered index, the nested loop join is requesting the matching rows from the clustered index by calling the key lookup.

Figure 7 The query execution plan shows the bookmark lookup as an Index seek being joined with a key lookup.

It's a common saying that Select * is a poor practice because it returns too many, and often unnecessary, columns in the result set — the extra data is considered wasteful if it is not needed. I agree that Select * is a poor practice, but often the reason isn't due to the extra network traffic; it's the bookmark lookup that is almost always generated by a Select *. However, the extra network traffic associated with bringing back more data than is needed can significantly hinder application throughput.

This query builds on the last bookmark lookup query and sheds a little more light on the bookmark lookup problem; the difference is that this query requests only one column that's not available from the nonclustered index.

SELECT WorkOrderID, StartDate
 FROM Production.WorkOrder
 WHERE ProductID = 757;

Consider the performance difference (again, refer to Table 1) between this query path and the select * bookmark lookup query path. Their performance is nearly identical.

It doesn't take many columns to force a bookmark lookup; a single column missing from the nonclustered index means SQL Server must also look to the clustered index to solve the query.

There are only two ways to avoid the bookmark lookup problem:

Filter by the clustered index key columns, so the query can be satisfied using the clustered index (Query Path 2 or 3).
Design a covering index (the next query path).

Query Path 6 — Covering Index

If a nonclustered index includes every column required by the query (and that means every column referenced by the query: select columns, join on condition columns, group by columns, where clause columns, and windowing columns), SQL Server's query optimizer can choose to execute the query using only that nonclustered index. When this occurs the index is said to cover the needs of the query, in other words, it's a covering index.

This is an important concept to grasp to understand how nonclustered indexes operate. A nonclustered index is a separate data structure than the base table. By default, nonclustered indexes have the same number of rows as the base table. In this aspect, you can think of a nonclustered index as a smaller sorted table — one that you can access quickly. When it makes sense, the query optimizer can choose to use ONLY this separate nonclustered index structure to satisfy a query. If this happens, the base table isn't even used in the query — only the nonclustered index.

A covering index is a concept that applies only to nonclustered indexes and only in the context of a query. There is no such thing as a covering index as a single entity; it is applicable only in the context of a query that uses the index.

Query Path 5's second query selected the StartDate column. Because StartDate isn't part of the IX_WorkOrder_ProductID index, SQL Server was forced to use a bookmark lookup. To solve the problem, the following code adds StartDate to the IX_WorkOrder_ProductID index so that the index can cover the query.

DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID

CREATE INDEX IX_WorkOrder_ProductID
 ON Production.WorkOrder (ProductID)
 INCLUDE (StartDate);

The include option (added in SQL Server 2005) adds the StartDate column to the leaf level of the IX_WorkOrder_ProductID index — but not to the list of keys of the index. This enables you to define additional columns that cover queries without hitting the index column or size limit. Included columns are stored at the leaf levels of the nonclustered index, but not at the intermediate levels. The query optimizer can now solve the queries with an index seek (as show in Figure 8):

Figure 8 With the StartDate column included in the index, the queries are solved with an index seek — a perfect covering index.

SELECT WorkOrderID, StartDate
 FROM Production.WorkOrder
 WHERE ProductID = 757; -- 9 rows

SELECT WorkOrderID, StartDate
 FROM Production.WorkOrder
 WHERE ProductID = 945; –- 1,105 rows

As mentioned earlier, when a nonclustered index is defined on a clustered base table, the clustered keys are stored in the nonclustered index to be used as a pointer back to the base table. Because of this, the nonclustered index can satisfy queries that include the clustered key columns. The following query filters by the nonclustered index key and returns the clustered index key value:

SELECT WorkOrderID
 FROM Production.WorkOrder
 WHERE ProductID = 757;

The Ix_WorkOrder_ProductID nonclustered index has the ProductID column as the key column, and the clustered index is defined on the WorkOrderID column which is available in the nonclustered index.

The next query is a rare example of a covering index. Compared to the previous query path, this query adds the StartDate column in the where clause. Conventional wisdom would say that this query requires an index scan because it filters by a nonkey column. (StartDate is an included column in the index and not a key column.)

SELECT WorkOrderID
 FROM Production.WorkOrder
 WHERE ProductID = 945
  AND StartDate = ‘2006-01-04';

In this case the index seek operator uses the index (keyed by ProductID) to seek the rows matching ProductID = 945.

Then the index seek operator continues to select the correct rows by filtering the rows by the included column (AND StartDate = ‘2006-01-04'). In the index seek properties (see Figure 9), the predicate is filtering by the StartDate column.

Figure 9 The index seek operator can have a seek predicate, which uses the B-tree; and a predicate, which functions as a nonindexed filter.

The performance difference between the bookmark lookup solution and the covering index is dramatic. When comparing the query optimizer cost and the logical reads (refer to Table 1), the query paths that use a covering index are approximately 12 times more efficient. (The duration appears less in the figure due to my limited hardware.)

Others

- SQL Server 2012 : The Path of the Query (part 2) - Range Seek Query, Filter by Nonkey Column

- SQL Server 2012 : The Path of the Query (part 1) - Fetch All, Clustered Index Seek

- SQL Server 2012 : Indexing Basics (part 2) - Index Selectivity, Query Operators

- SQL Server 2012 : Indexing Basics (part 1) - The B-Tree Index, Clustered Indexes, Nonclustered Indexes

- Windows 7 : Using Internet Explorer 8 - Using Multimedia Browsing and Downloading (part 3)

- Windows 7 : Using Internet Explorer 8 - Using Multimedia Browsing and Downloading (part 2)

- Windows 7 : Using Internet Explorer 8 - Using Multimedia Browsing and Downloading (part 1)

- Windows Server 2012 : Highly available, easy-to-manage multi-server platform - Management efficiency (part 3) - PowerShell 3.0

- Windows Server 2012 : Highly available, easy-to-manage multi-server platform - Management efficiency (part 2) - Simplified Active Directory administration

- Windows Server 2012 : Highly available, easy-to-manage multi-server platform - Management efficiency (part 1) - The new Server Manager