3. Understanding Statistics
Statistics provide critical information
needed by SQL Server when performing query optimization. SQL Server
statistics contain details about the data, and what the data looks like
in each table within the database.
The query optimization process uses statistics to
determine how many rows a query might need to access for a given query
plan. It uses this information to develop its cost estimate for each
step in the plan. If statistics are missing or invalid, the Query
Optimizer can arrive at an incorrect cost for a step, and thus choose
what ends up being a bad plan.
You can examine the statistics for any table in
the database by using SQL Server Management Studio, expanding the
Object Explorer to show the table you are interested in. For example, Figure 1 shows the person.Address
table in the AdventureWorks2012 database. Expand the table node, under
which you will see a Statistics node. Expand this, and you will see a
statistic listed for each index that has been created, and in many
cases you will see additional statistics listed, often with cryptic
names starting with _WA. These are statistics that SQL Server has
created automatically for you, based upon queries that have been run
against the database. SQL Server creates these statistics when the AUTO_CREATE_STATISTICS option is set to ON.
To see the actual statistic values, you can
select an individual statistic, right-click it, and select the
Properties option from the menu options. This will show you the
Properties dialog for the statistic you selected. The first page,
General, displays the columns in the statistic and when it was last
updated. The Details page contains the real guts of the statistic, and
shows the data distribution. For the PK_Address-AddressID statistic on the person.Address table in AdventureWorks2012, you should see something similar to Figure 2.
This figure shows just part of the multi-column
output, which is the same output that you get when running the
following DBCC command:
DBCC SHOW_STATISTICS ("Person.Address", PK_Address_AddressID);
The following SQL Server configuration options control how statistics are created.
Auto_create_statistics
When this is on (default), SQL Server
automatically creates statistics when it thinks they would result in a
better plan. That usually means when it is optimizing a query that
references a column without statistics.
Auto_update_statistics
When this is on (default), SQL Server
automatically updates statistics when a sufficient amount of the data
in the relevant columns has changed. By default, this is done
synchronously, which means that a query has to wait for the statistics
to be updated before the optimization process can be completed.
Auto_update_statistics_asynchronously
When this option is on, SQL
Server updates statistics asynchronously. This means that when it’s
trying to optimize a query and the statistics are outdated, it will
continue optimizing the current query using the old stats, and queue
the stats to be updated asynchronously. As a result, the current query
doesn’t benefit from the new stats, but it does not have to wait while
stats are being updated before getting a plan and running. Any future
queries can then benefit from the new stats.