Query Plan Operators
Join Operators
Join operators enable SQL Server to
find matching rows between two tables. Prior to SQL Server 2005, there
was only a single join type, the nested loop join, but since then additional join types have been added, and SQL Server now provides the three join types described in Table 1. These join types handle rows from two tables; for a self-join, the inputs may be different sets of rows from the same table.
TABLE 1: SQL Server Join Types
JOIN TYPE |
BENEFIT |
Nested loop |
Good for small tables where there is an index on the inner table on the join key |
Merge join |
Good for medium-size tables where there are ordered indexes, or where the output needs to be ordered |
Hash join |
Good for medium to large tables. Works well with parallel plans, and scales well. |
Nested Loop
The nested loop join is the original
SQL Server join type. The behavior of a nested loop is to scan all the
rows in one table (the outer table) and for each row in that table, it
then scans every row in the other table (the inner table). If the rows
in the outer and inner tables match, then the row is included in the
results.
The performance of this join is directly
proportional to the number of rows in each table. It performs well when
there are relatively few rows in one of the tables, which would be
chosen as the inner table, and more rows in the other table, which
would be used as the outer table. If both tables have a relatively
large number of rows, then this join starts to take a very long time.
Merge
The merge join needs its inputs to be
sorted, so ideally the tables should be indexed on the join column.
Then the operator iterates through rows from both tables at the same
time, working down the rows, looking for matches. Because the inputs
are ordered, this enables the join to proceed quickly, and to end as
soon as any range is satisfied.
Hash
The hash join operates in two phases. During the first phase, known as the build phase,
the smaller of the two tables is scanned and the rows are placed into a
hash table that is ideally stored in memory; but for very large tables,
it can be written to disk. When every row in the build input table is
hashed, the second phase starts. During the second phase, known as the probe phase,
rows from the larger of the two tables are compared to the contents of
the hash table, using the same hashing algorithm that was used to
create the build table hash. Any matching rows are passed to the output.
The hash join has variations on this processing
that can deal with very large tables, so the hash join is the join of
choice for very large input tables, especially when running on
multiprocessor systems where parallel plans are allowed.
HASH WARNINGS
Hash warnings
are SQL Profiler events that are generated when hash recursion, or hash
bailout, occurs. Hash recursion happens when the output from the hash
operation doesn’t fit entirely in memory. Hash bailout occurs when hash
recursion reaches its maximum level of recursion, and a new plan has to
be chosen.
Anytime you see hash warnings, it is a potential indicator of performance problems and should be investigated.
Possible solutions to hash warnings include the following:
- Increase memory on the server.
- Make sure statistics exist on the join columns.
- Make sure statistics are current.
- Force a different type of join.
Spool Operators
The various spool operators are used to
create a temporary copy of rows from the input stream and deliver them
to the output stream. Spools typically sit between two other operators:
The one on the right is the child, and provides the input stream. The
operator on the left is the parent, and consumes the output stream.
The following list provides a brief description
of each of the physical spool operators. These are the operators that
actually execute. You may also see references to logical operators,
which represent an earlier stage in the optimization process; these are
subsequently converted to physical operators before executing the plan.
The logical spool operators are Eager Spool, and Lazy Spool.
- Index spool — This operator reads
rows from the child table, places them in tempdb, and creates a
nonclustered index on them before continuing. This enables the parent
to take advantage of seeking against the nonclustered index on the data
in tempdb when the underlying table has no applicable indexes.
- Row count spool — This operator
reads rows from the child table and counts the rows. The rows are also
returned to the parent, but without any data. This enables the parent
to determine whether rows exist in order to satisfy an EXISTS or NOT EXISTS requirement.
- Table spool — This operator reads
the rows from the child table and writes them into tempdb. All rows
from the child are read and placed in tempdb before the parent can
start processing rows.
- Window spool — This operator
expands each row into the set of rows that represent the window
associated with it. It’s both a physical and logical operator.
Scan and Seek Operators
These operators enable SQL Server to
retrieve rows from tables and indexes when a larger number of rows is
required. This behavior contrasts with the individual row access
operators key lookup and RID lookup, which are discussed in the next section.
- Scan operator — The scan operator
scans all the rows in the table looking for matching rows. When the
number of matching rows is >20 percent of the table, scan can start
to outperform seek due to the additional cost of traversing the index
to reach each row for the seek.
There are scan operator variants for a clustered index scan, a nonclustered index scan, and a table scan.
- Seek operator — The seek operator
uses the index to find matching rows; this can be either a single
value, a small set of values, or a range of values. When the query
needs only a relatively small set of rows, seek is significantly faster
than scan to find matching rows. However, when the number of rows
returned exceeds 20 percent of the table, the cost of seek will
approach that of scan; and when nearly the whole table is required,
scan will perform better than seek.
There are seek operator variants for a clustered index seek and a nonclustered index seek.
Lookup Operators
Lookup operators perform the task of finding a single row of data. The following is a list of common operators:
- Bookmark lookup — Bookmark lookup
is seen only in SQL Server 2000 and earlier. It’s the way that SQL
Server looks up a row using a clustered index. In SQL Server 2012 this
is done using either Clustered Index Seek, RID lookup, or Key Lookup.
- Key lookup — Key lookup is how a
single row is returned when the table has a clustered index. In
contrast with dealing with a heap, the lookup is done using the
clustering key. The key lookup operator was added in SQL Server 2005
SP2. Prior to this, and currently when viewing the plan in text or XML
format, the operator is shown as a clustered index seek with the
keyword lookup.
- RID lookup — RID lookup is how a single row is looked up in a heap. RID refers to the internal unique row identifier (hence RID), which is used to look up the row.