5. Influencing Optimization
There are two main ways you can influence the Query Optimizer — by using query hints or plan guides.
Query Hints
Query hints are an easy way to
influence the actions of query optimization. However, you need to very
carefully consider their use, as in most cases SQL Server is already
choosing the right plan. As a general rule, you should avoid using
query hints, as they provide many opportunities to cause more issues
than the one you are attempting to solve. In some cases, however, such
as with complex queries or when dealing with complex datasets that
defeat SQL Server’s cardinality estimates on specific queries, using
query hints may be necessary.
Before using any query hints, run a web search
for the latest information on issues with query hints. Try searching on
the keywords “SQL Server Query Hints” and look specifically for
anything by Craig Freedman, who has written several great blog entries
on some of the issues you can encounter when using query hints.
Problems with using hints can happen at any time
— from when you start using the hint, which can cause unexpected side
effects that cause the query to fail to compile, to more complex and
difficult to find performance issues that occur later.
As data in the relevant tables changes, without
query hints the Query Optimizer automatically updates statistics and
adjusts query plans as needed; but if you have locked the Query
Optimizer into a specific set of optimizations using query hints, then
the plan cannot be changed, and you may end up with a considerably
worse plan, requiring further action (from you) to identify and resolve
the root cause of the new performance issue.
One final word of caution about using query hints: Unlike locking hints (also referred to in BOL as table hints),
which SQL Server attempts to satisfy, query hints are stronger, so if
SQL Server is unable to satisfy a query hint it will raise error 8622
and not create any plan.
Query hints are specified using the OPTION
clause, which is always added at the end of the T-SQL statement —
unlike locking or join hints, which are added within the T-SQL
statement after the tables they are to affect.
The following sections describe a few of the more interesting query hints.
FAST number_rows
Use this query hint when you want to retrieve only the first n
rows out of a relatively large result set. A typical example of this is
a website that uses paging to display large sets of rows, whereby the
first page shows only the first web page worth of rows, and a page
might contain only 20, 30, or maybe 40 rows. If the query returns
thousands of rows, then SQL Server would possibly optimize this query
using hash joins. Hash joins work well with large datasets but have a
higher setup time than perhaps a nested loop join. Nested loop joins
have a very low setup cost and can return the first set of rows more
quickly but takes considerably longer to return all the rows. Using the
FAST <number_rows>
query hint causes the Query Optimizer to use nested loop joins and
other techniques, rather than hashed joins, to get the first n rows faster.
Typically, once the first n rows are returned, if the remaining rows are retrieved, the query performs slower than if this hint were not used.
{Loop | Merge | Hash } JOIN
The JOIN
query hint applies to all joins within the query. While this is similar
to the join hint that can be specified for an individual join between a
pair of tables within a large more complex query, the query hint
applies to all joins within the query, whereas the join hint applies only to the pair of tables in the join with which it is associated.
To see how this works, here is an example query
using the AdventureWorks2012 database that joins three tables. The
first example shows the basic query with no join hints.
use AdventureWorks2012
go
set statistics profile on
go
select p.title, p.firstname, p.middlename, p.lastname
, a.addressline1, a.addressline2, a.city, a.postalcode
from person.person as p inner join person.businessentityaddress as b
on p.businessentityid= b.businessentityid
inner join person.address as a on b.addressid= a.addressid
go
set statistics profile off
go
This returns two result sets. The first is the
output from the query, and returns 18,798 rows; the second result set
is the additional output after enabling the set statistics profile option. One interesting piece of information in the statistics profile output is the totalsubtreecost
column. To see the cost for the entire query, look at the top row. On
my test machine, this query is costed at 4.649578. The following shows
just the PhysicalOp column from the statistics profile output, which displays the operator used for each step of the plan:
PHYSICALOP
NULL
Merge Join
Clustered Index Scan
Sort
Merge Join
Clustered Index Scan
Index Scan
The next example shows the same query
but illustrates the use of a table hint. In this example the join hint
applies only to the join between person.person and person.businessentity:
use AdventureWorks2012
go
set statistics profile on
go
select p.title, p.firstname, p.middlename, p.lastname
, a.addressline1, a.addressline2, a.city, a.postalcode
from person.person as p inner loop join person.businessentityaddress as b
on p.businessentityid= b.businessentityid
inner join person.address as a on b.addressid= a.addressid
go
set statistics profile off
go
The totalsubtree cost for this option is
8.155532, which is quite a bit higher than the plan that SQL chose, and
indicates that our meddling with the optimization process has had a
negative impact on performance.
The PhysicalOp
column of the statistics profile output is shown next. This indicates
that the entire order of the query has been dramatically changed; the
merge joins have been replaced with a loop join as requested, but this
forced the Query Optimizer to use a hash match
join for the other join. You can also see that the Optimizer chose to
use a parallel plan, and even this has not reduced the cost:
PhysicalOp
NULL
Parallelism
Hash Match
Parallelism
Nested Loops
Clustered Index Scan
Clustered Index Seek
Parallelism
Index Scan
The final example shows the use of a
JOIN query hint. Using this forces both joins within the query to use
the join type specified:
use AdventureWorks2012
go
set statistics profile on
go
select p.title, p.firstname, p.middlename, p.lastname
, a.addressline1, a.addressline2, a.city, a.postalcode
from person.person as p inner join person.businessentityaddress as b
on p.businessentityid = b.businessentityid
inner join person.address as a on b.addressid = a.addressid
option (hash join )
go
set statistics profile off
go
The total subtreecost for this plan is
5.097726. This is better than the previous option but still worse than
the plan chosen by SQL Server.
The PhysicalOp column of the following statistics profile output indicates that both joins are now hash joins:
PhysicalOp
NULL
Parallelism
Hash Match
Parallelism
Hash Match
Parallelism
Index Scan
Parallelism
Clustered Index Scan
Parallelism
Index Scan
Using a query hint can cause both
compile-time and runtime issues. The compile-time issues are likely to
happen when SQL Server is unable to create a plan due to the query
hint. Runtime issues are likely to occur when the data has changed
enough that the Query Optimizer needs to create a new plan using a
different join strategy but it is locked into using the joins defined
in the query hint.
MAXDOP n
The MAXDOP
query hint is only applicable on systems and SQL Server editions for
which parallel plans are possible. On single-core systems,
multiprocessor systems where CPU affinity has been set to a single
processor core, or systems that don’t support parallel plans (i.e. if
you are running the express edition of SQL Server which can only
utilize a single processor core), this query hint has no effect.
On systems where parallel plans are possible, and in the case of a query where a parallel plan is being generated, using MAXDOP (n) allows the Query Optimizer to use only n workers.
On very large SMPs or NUMA systems, where the SQL Server configuration setting for Max Degree of Parallelism is set to a number less than the total available CPUs, this option can be useful if you want to override the systemwide Max Degree of Parallelism setting for a specific query.
A good example of this might be a 16 core SMP
server with an application database that needs to service a large
number of concurrent users, all running potentially parallel plans. To
minimize the impact of any one query, the SQL Server configuration
setting Max Degree of Parallelism
is set to 4, but some activities have a higher “priority” and you want
to allow them to use all CPUs. An example of this might be an
operational activity such as an index rebuild, when you don’t want to
use an online operation and you want the index to be created as quickly
as possible. In this case, the specific queries for index
creation/rebuilding can use the MAXDOP 16 query hint, which allows SQL Server to create a plan that uses all 16 cores.
OPTIMIZE FOR
Because of the extensive use of plan
parameterization, and the way that the Query Optimizer sniffs for
parameters on each execution of a parameterized plan, SQL Server
doesn’t always do the best job of choosing the right plan for a
specific set of parameters.
The OPTIMIZE FOR
hint enables you to tell the Query Optimizer what values you expect to
see most commonly at runtime. Provided that the values you specify are
the most common case, this can result in better performance for the
majority of the queries, or at least those that match the case for
which you optimized.
RECOMPILE
The RECOMPILE
query hint is a more granular way to force recompilation in a stored
procedure to be at the statement level rather than using the WITH RECOMPILE option, which forces the whole stored procedure to be recompiled.
When the Query Optimizer sees the RECOMPILE
query hint, it forces a new query plan to be created regardless of what
plans may already be cached. The new plan is created with the
parameters within the current execution context.
This is a very useful option if you know that a
particular part of a stored procedure has very different input
parameters that can affect the resulting query plan dramatically. Using
this option may incur a small cost for the compilation needed on every
execution, but if that’s a small percentage of the resulting query’s
execution time, it’s a worthwhile cost to pay to ensure that every
execution of the query gets the most optimal plan.
For cases in which the additional compilation
cost is high relative to the cost of the worst execution, using this
query hint would be detrimental to performance.
USE PLAN N ‘xml plan’
The USE PLAN
query hint tells the Query Optimizer that you want a new plan, and that
the new plan should match the shape of the plan in the supplied XML
plan.
This is very similar to the use of plan guides
(covered in the next section), but whereas plan guides don’t require a
change to the query, the USE PLAN query hint does require a change to the T-SQL being submitted to the server.
Sometimes this query hint is used to solve
deadlock issues or other data-related problems. However, in nearly all
cases the correct course of action is to address the underlying issue,
but that often involves architectural changes, or code changes that
require extensive development and test work to get into production. In
these cases the USE PLAN query
hint can provide a quick workaround for the DBA to keep the system
running while the root cause of a problem is found and fixed.
Note that the preceding course of action assumes
you have a “good” XML plan from the problem query that doesn’t show the
problem behavior. If you just happened to capture a bunch of XML plans
from all the queries running on your system when it was working well,
then you are good to go, but that’s not typically something that anyone
ever does, as you usually leave systems alone when they are working OK;
and capturing XML plans for every query running today just in case you
may want to use the USE PLAN query hint at some point in the future is not a very useful practice.
What you may be able to do, however, is configure
a test system with data such that the plan your target query generates
is of the desired shape, capture the XML for the plan, and use that XML
plan to “fix” the plan’s shape on your production server.
Plan Guides
Plan guides, which were added in SQL
Server 2005, enable the DBA to affect the optimization of a query
without altering the query itself. Typically, plan guides are used by
DBAs seeking to tune query execution on third-party application
databases, where the T-SQL code being executed is proprietary and
cannot be changed. Typical examples of applications for which plan
guides are most likely to be needed would be large ERP applications
such as SAP, PeopleSoft, and so on.
Although plan guides were first added in SQL
Server 2005, significant enhancements, primarily regarding ease of use,
were made to them in SQL Server 2008.
There are three different types of plan guide:
- Object plan guide — Can be applied to a stored procedure, trigger, or user-defined function
- SQL plan guide — Applied to a specific SQL statement
- Template plan guide — Provides a way to override database settings for parameterization of specific SQL queries
To make use of plan guides, the
first step is to create or capture a “good” plan; the second step is to
apply that plan to the object or T-SQL statement for which you want to
change the Query Optimizer’s behavior.