A good way to understand how
to design efficient indexes is to observe and learn from the various
possible paths' queries use to locate data using indexes.
The following section compares and contrasts ten different query paths. Not every query path is an efficient query path.
A good test table for observing the 10 query paths in the AdventureWorks2012 database is the Production.WorkOrder table. It has 72,591 rows, 10 columns, and a single-column clustered primary key. Here's the table definition:
CREATE TABLE [Production].[WorkOrder](
[WorkOrderID] [int] IDENTITY(1,1) NOT NULL,
[ProductID] [int] NOT NULL,
[OrderQty] [int] NOT NULL,
[StockedQty] AS (isnull([OrderQty]-[ScrappedQty],(0))),
[ScrappedQty] [smallint] NOT NULL,
[StartDate] [datetime] NOT NULL,
[EndDate] [datetime] NULL,
[DueDate] [datetime] NOT NULL,
[ScrapReasonID] [smallint] NULL,
[ModifiedDate] [datetime] NOT NULL,
CONSTRAINT [PK_WorkOrder_WorkOrderID] PRIMARY KEY CLUSTERED
([WorkOrderID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
The WorkOrder table has three indexes, each with one column as identified in the index name:
- PK_WorkOrder_WorkOrderID (Clustered)
- IX_WorkOrder_ProductID (Nonunique, Nonclustered)
- IX_WorkOrder_ScrapReasonID (Nonunique, Nonclustered)
Performance data for each path, listed in Table 1,
was captured by watching the T-SQL ? SQL:StmtCompleted and Performance
? Showplan XML Statistics Profile events in Profiler and examining the
Query Execution Plan.
Table 1 Query Path Performance
The key performance indicators are the query execution plan optimizer costs (Cost), and the number of logical reads (Reads).
For the duration column, each query path was
executed multiple times with the results averaged. You should run the
script on your own SQL Server instance, take your own performance
measurements, and study the query execution plans.
The rows-per-ms column is calculated from the
number of rows returned and the average duration. Before executing each
query path, the following code clears the buffers:
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
Query Path 1 — Fetch All
The first query path sets a baseline for performance by simply requesting all the data from the base table.
SELECT *
FROM Production.WorkOrder;
Without a where clause and every column selected, the query must read every row from the clustered index. A clustered index scan (shown in Figure 1) sequentially reads every row.
This query is the longest query of all the query
paths, so it might seem to be a slow query, however, when comparing the
number of rows returned per millisecond, the index scan returns the
highest number of rows per millisecond of any query path.
Query Path 2 — Clustered Index Seek
The second query path adds a where clause to the first query and filters the result to a single row using a clustered key value:
SELECT *
FROM Production.WorkOrder
WHERE WorkOrderID = 1234;
The query optimizer has two clues that there's only one row that meets the where clause criteria: Statistics and that WorkOrderID is the primary key constraint, so it must be unique. WorkOrderID
is also the clustered index key, so the query optimizer knows there's a
great index available to locate a single row. The clustered index seek
operation navigates the clustered index B-tree and quickly locates the
desired row, as shown in Figure 2.
Conventional wisdom holds that this is the
fastest possible query path, and it is snappy when returning a single
row; however, from rows returned on a per millisecond basis, it's one
of the slowest query paths.
A common myth is that seeks can return
only single rows, and that's why seeking multiple rows would be slow
compared to scans. As the next two query paths indicate, that's not
true.