An index strategy deals with the overall application rather than fixing isolated problems to the detriment of the whole.
Identifying Key Queries
Analyzing a full query workload, which
includes a couple of days of operations and nightly or weekend
workloads, can likely reveal that although there may be a few hundred
distinct queries, the majority of the CPU time is spent on the top
handful of queries. I've tuned systems where 95 percent of the CPU time
was spent on only five queries. Those top queries demand flat-out
performance, whereas the other queries might afford a bookmark lookup.
To identify those top queries, follow these steps:
1. Create a profiler trace to capture all queries or stored procedures:
Profiler Event: T-SQL SQL:StmtCompleted and RPC:Completed
Profiler Columns: TextData, ApplicationName, CPU, Reads, Writes, Duration, SPID, EndTime, DatabaseName, and RowCounts.
Do NOT filter the trace to capture only
long-running queries. (A common suggestion is to set the filter to
capture only queries with a duration > 1 sec.) Every query must be
captured.
2. Test the
trace definition using Profiler for a few moments; then stop the trace.
Be sure to filter out applications or databases not being analyzed.
3. In the
trace properties, add a stop time to the trace definition (so it can
capture a full day's and night's workload), and set up the trace to
write to a file.
4. Generate a trace script using File → Export → Script Trace Definition → for SQL Server 2005-SQL11.
5. Check the
script. You may need to edit the script to supply a filename and path
and double-check the start and stop times. Execute the trace script on
the production server for 24 hours.
6. Pull the
trace file into Profiler. This can be done through the Open → Trace
File dialog in SQL Profiler. Then save it to a table using File → Save
As → Trace Table.
7. Profiler exports the TextData column as nText data type, and that just won't do. The following code creates an nVarChar(max) column that is friendlier with string functions:
ALTER TABLE trace
ALTER COLUMN textdata NVARCHAR(MAX);
8. Run the
following aggregate query to summarize the query load. This query
assumes the trace data was saved to a table creatively named trace:
select substring(textdata, 1, CHARINDEX(’ ‘,qtextdata, 6)),
count(*) as ‘count',
sum(duration) as ‘SumDuration',
avg(duration) as ‘AvgDuration',
max(duration) as ‘MaxDuration',
cast(SUM(duration) as numeric(20,2))
/ (select sum(Duration) from trace) as ‘Percentage',
sum(rowcounts) as ‘SumRows'
from trace
group by substring(textdata, 1, charindex(’ ‘,textdata, 6))
order by sum(Duration) desc;
The top queries are obvious.
Note
The Database Engine Tuning Advisor is a
SQL Server utility that can analyze a single query or a set of queries
and recommend indexes and partitions to improve performance.
Selecting the Clustered Index
A clustered index can affect performance in several ways:
- When an index seek operation finds a row using a clustered index,
the data is right there — no bookmark lookup is necessary. This makes
the column used to select the row, probably the primary key, an ideal
candidate for a clustered index.
- Clustered indexes gather rows with the same or similar values to
the smallest possible number of data pages, thus reducing the number of
data pages required to retrieve a set a rows. Clustered indexes are
therefore excellent for columns that are often used to select a range
of rows, such as secondary table foreign keys like OrderDetail.OrderID.
Creating Base Indexes
Even before tuning, the locations of a
few key indexes are easy to determine. These indexes are the first step
in building a solid set index foundation. Following are a few things to
keep in mind when building these base indexes:
- As a rule of thumb, plan to create a clustered index for every
table. There are a few exceptions to this rule, but as a general plan
cluster your tables. In many cases, it makes sense to create the
clustered index on the primary key of the table.
- Plan to create nonclustered indexes for each column belonging to a
foreign key constraint. When data is entered that must adhere to a
foreign key constraint, SQL Server must verify that the new values
conform to the constraint. Nonclustered indexes prove critical to
accomplish this lookup.
- Review the queries that you know will be executed often. This is
where your relationship with the application developers is crucial.
Everyone wins if the database supporting the application is indexed
appropriately when the application rolls out.
Although this indexing plan is far from perfect, and it's definitely not
a final indexing plan, it provides an initial compromise between no
indexes and tuned indexes and can be a baseline performance measurement
to compare against future index tuning.
Additional tuning can likely involve creating composite indexes and removing unnecessary indexes.
Best Practice
When planning indexes, there's a fine
line between serving the needs of select queries versus update queries.
Although an index may improve query performance, there's a performance
cost because when a row is inserted, updated, or deleted, the indexes
must be updated as well. Nonetheless, some indexing is necessary for
write operations. The update or delete operation must locate the row
prior to performing the write operation, and useful indexes facilitate
locating that row, thereby speeding up write operations.
Therefore, when planning indexes, include the fewest number of indexes to accomplish the job.
Tip
SQL Server exposes index usage statistics via dynamic management views. Specifically, sys.dm_db_index_operational_stats and sys.dm_db_index_usage_stats
uncover information about how indexes are used. In addition, four
dynamic management views reveal indexes that the Query Optimizer looked
for, but didn't find: sys.dm_db_missing_index_groups, sys.dm_db_missing_index_group_stats, sys.dm_db_missing_index_columns, and sys.dm_db_missing_index_details.