Next, the results of the query shown in Listing 17 indicate which database files are seeing the most I/O stalls.
LISTING 17: I/O stall information by database file
-- Calculates average stalls per read, per write,
-- and per total input/output for each database file.
SELECT DB_NAME(fs.database_id) AS [Database Name], mf.physical_name,
io_stall_read_ms, num_of_reads,
CAST(io_stall_read_ms/(1.0 + num_of_reads) AS NUMERIC(10,1)) AS
[avg_read_stall_ms],io_stall_write_ms,
num_of_writes,CAST(io_stall_write_ms/(1.0+num_of_writes) AS NUMERIC(10,1)) AS
[avg_write_stall_ms],
io_stall_read_ms + io_stall_write_ms AS [io_stalls], num_of_reads + num_of_writes
AS [total_io],
CAST((io_stall_read_ms + io_stall_write_ms)/(1.0 + num_of_reads + num_of_writes) AS
NUMERIC(10,1))
AS [avg_io_stall_ms]
FROM sys.dm_io_virtual_file_stats(null,null) AS fs
INNER JOIN sys.master_files AS mf WITH (NOLOCK)
ON fs.database_id= mf.database_id
AND fs.[file_id] = mf.[file_id]
ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);
-- Helps determine which database files on
-- the entire instance have the most I/O bottlenecks
This query lists each database file (data and
log) on the instance, ordered by the average I/O stall time in
milliseconds. This is one way of determining which database files are
spending the most time waiting on I/O. It also gives you a better idea
of the read/write activity for each database file, which helps you
characterize your workload by database file. If you see a lot of
database files on the same drive that are at the top of the list for
this query, that could be an indication that you are seeing disk I/O
bottlenecks on that drive. You would want to investigate this issue
further, using Windows Performance Monitor metrics such as Avg Disk
Sec/Write and Avg Disk Sec/Read for that logical disk. After you have
gathered more metrics and evidence, talk to your system administrator
or storage administrator about this issue. Depending on what type of
storage you are using , it might be possible to improve
the I/O performance situation by adding more spindles, changing the
RAID controller cache policy, or changing the RAID level. You also
might consider moving some of your database files to other drives if
possible.
Now, using the query shown in Listing 18, you are going to see which user databases on the instance are using the most memory.
LISTING 18: Total buffer usage by database
-- Get total buffer usage by database for current instance
SELECT DB_NAME(database_id) AS [Database Name],
COUNT(*) * 8/1024.0 AS [Cached Size (MB)]
FROM sys.dm_os_buffer_descriptors WITH (NOLOCK)
WHERE database_id > 4 -- system databases
AND database_id <> 32767 -- ResourceDB
GROUP BY DB_NAME(database_id)
ORDER BY [Cached Size (MB)] DESC OPTION (RECOMPILE);
-- Tells you how much memory (in the buffer pool)
-- is being used by each database on the instance
This query will list the total buffer usage for
each user database running on the current instance. Especially if you
are seeing signs of internal memory pressure, you are going to be
interested in knowing which database(s) are using the most space in the
buffer pool. One way to reduce memory usage in a particular database is
to ensure that you don’t have a lot of missing indexes on large tables
that are causing a large number of index or table scans. Another way,
if you have SQL Server Enterprise Edition, is to start using SQL Server
data compression on some of your larger indexes (if they are good
candidates for data compression). The ideal candidate for data
compression is a large static table that is highly compressible because
of the data types and actual data in the table. A bad candidate for
data compression is a small, highly volatile table that does not
compress well. A compressed index will stay compressed in the buffer
pool, unless any data is updated. This means that you may be able to
save many gigabytes of space in your buffer pool under ideal
circumstances.
Next, you will take a look at which user
databases on the instance are using the most processor time by using
the query shown in Listing 19.
LISTING 19: CPU usage by database
-- Get CPU utilization by database
WITH DB_CPU_Stats
AS
(SELECT DatabaseID, DB_Name(DatabaseID) AS [DatabaseName],
SUM(total_worker_time) AS [CPU_Time_Ms]
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY (SELECT CONVERT(int, value) AS [DatabaseID]
FROM sys.dm_exec_plan_attributes(qs.plan_handle)
WHERE attribute = N’dbid’) AS F_DB
GROUP BY DatabaseID)
SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [row_num],
DatabaseName, [CPU_Time_Ms],
CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms])
OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPUPercent]
FROM DB_CPU_Stats
WHERE DatabaseID > 4 -- system databases
AND DatabaseID <> 32767 -- ResourceDB
ORDER BY row_num OPTION (RECOMPILE);
-- Helps determine which database is
-- using the most CPU resources on the instance
This query shows you the CPU utilization time by
database for the entire instance. It can help you characterize your
workload, but you need to take the results with a bit of caution. If
you have recently cleared the plan cache for a particular database,
using the DBCC FLUSHPROCINDB (database_id)
command, it will throw off the overall CPU utilization by database
numbers for the query. Still, this query can be useful for getting a
rough idea of which database(s) are using the most CPU on your instance.
The next query, shown in Listing 20,
is extremely useful. It rolls up the top cumulative wait statistics
since SQL Server was last restarted or since the wait statistics were
cleared by using the DBCC SQLPERF (’sys.dm_os_wait_stats’, CLEAR) command.
LISTING 20: Top cumulative wait types for the instance
-- Isolate top waits for server instance since last restart or statistics clear
WITH Waits AS
(SELECT wait_type, wait_time_ms / 1000. AS wait_time_s,
100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct,
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn
FROM sys.dm_os_wait_stats WITH (NOLOCK)
WHERE wait_type NOT IN (N'CLR_SEMAPHORE',N'LAZYWRITER_SLEEP',N'RESOURCE_QUEUE',
N'SLEEP_TASK',N'SLEEP_SYSTEMTASK',N'SQLTRACE_BUFFER_FLUSH',N'WAITFOR',
N'LOGMGR_QUEUE',N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH',
N'XE_TIMER_EVENT',N'BROKER_TO_FLUSH',N'BROKER_TASK_STOP',N'CLR_MANUAL_EVENT',
N'CLR_AUTO_EVENT',N'DISPATCHER_QUEUE_SEMAPHORE', N'FT_IFTS_SCHEDULER_IDLE_WAIT',
N'XE_DISPATCHER_WAIT', N'XE_DISPATCHER_JOIN', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'ONDEMAND_TASK_QUEUE', N'BROKER_EVENTHANDLER', N'SLEEP_BPOOL_FLUSH',
N'DIRTY_PAGE_POLL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'SP_SERVER_DIAGNOSTICS_SLEEP'))
SELECT W1.wait_type,
CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,
CAST(W1.pct AS DECIMAL(12, 2)) AS pct,
CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct
FROM Waits AS W1
INNER JOIN Waits AS W2
ON W2.rn <= W1.rn
GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct
HAVING SUM(W2.pct) - W1.pct < 99 OPTION (RECOMPILE); -- percentage threshold
-- Clear Wait Stats
-- DBCC SQLPERF(’sys.dm_os_wait_stats’, CLEAR);
This query will help you zero in on what your SQL
Server instance is spending the most time waiting for. Especially if
your SQL Server instance is under stress or having performance
problems, this can be very valuable information. Knowing that your top
cumulative wait types are all I/O related can point you in the right
direction for doing further evidence gathering and investigation of
your I/O subsystem. However, be aware of several important caveats when
using and interpreting the results of this query.
First, this is only a rollup of wait types since
the last time your SQL Server instance was restarted, or the last time
your wait statistics were cleared. If your SQL Server instance has been
running for several months and something important was recently
changed, the cumulative wait stats will not show the current actual top
wait types, but will instead be weighted toward the overall top wait
types over the entire time the instance has been running. This will
give you a false picture of the current situation.
Second, there are literally hundreds of different
wait types (with more being added in each new version of SQL Server). There is a lot of bad information on the Internet about what
many wait types mean, and how you should consider addressing them. Bob
Ward, who works for Microsoft Support, is a very reliable source for
SQL Server wait type information. He has a SQL Server Wait Type
Repository available online at http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx that documents many SQL Server wait types, including what action you might want to take to alleviate that wait type.
Finally, many common wait types are called benign wait types, meaning you can safely ignore them in most situations. The most common benign wait types are filtered out in the NOT IN
clause of the health check query to make the results more relevant.
Even so, I constantly get questions from DBAs who are obsessing over a
particular wait type that shows up in this query. My answer is
basically that if your database instance is running well, with no other
signs of stress, you probably don’t need to worry too much about your
top wait type, particularly if it is an uncommon wait type. SQL Server
is always waiting on something; but if the server is running well, with
no other warning signs, you should relax a little.
Next, using the query shown in Listing 21, you are going to look at the cumulative signal (CPU) waits for the instance.
LISTING 21: Signal waits for the instance
-- Signal Waits for instance
SELECT CAST(100.0 * SUM(signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
AS [%signal (cpu) waits],
CAST(100.0 * SUM(wait_time_ms - signal_wait_time_ms) / SUM (wait_time_ms) AS
NUMERIC(20,2)) AS [%resource waits]
FROM sys.dm_os_wait_stats WITH (NOLOCK) OPTION (RECOMPILE);
-- Signal Waits above 15-20% is usually a sign of CPU pressure
Signal waits are CPU-related waits. If
you are seeing other signs of CPU pressure on your SQL Server instance,
this query can help confirm or deny the fact that you are seeing
sustained cumulative CPU pressure. Usually, seeing signal waits above
15–20% is a sign of CPU pressure.