SQL Server 2012 : Specialty Indexes - Specialty Indexes, Indexed Views, The Columnstore Index

11/6/2013 8:39:18 PM

Beyond the standard clustered and nonclustered indexes, SQL Server offers two type of indexes referred to as specialty indexes. Filtered indexes, which were new in SQL Server 2008, include less data; and indexed views, available since SQL Server 2000, build out custom sets of data. Both are considered high-end performance tuning indexes.

1. Filtered Indexes

A nonclustered index contains a record for every record in the base table on which it is defined. Historically, this has always been a 1-to-1 relationship. SQL Server 2008 introduced the concept of a filtered index. With a filtered index, you can set a predicate on your CREATE INDEX statement so that the index contains only rows that meet the criteria you set. This option is only available for nonclustered indexes because a clustered index IS the table, so it wouldn't make sense to allow you to filter on a clustered index. Because a filtered index can potentially have much fewer records than the traditionally nonclustered index, they tend to be much smaller in size. In addition, because the index has fewer records, the statistics on these indexes tend to be more accurate, which can lead to better execution plans.

An example of employing a filtered index in AdventureWorks2012 is the ScrappedReasonID column in the Production.WorkOrder table. Fortunately, for AdventureWorks, they scrapped only 612 (.8 percent) parts over the life of the database. The existing IX_WorkOrder_ScrapReasonID includes every row. The ScrapReasonID foreign key in the Production.WorkOrder table enables nulls for work orders that were not scrapped. The index includes all the null values with pointers to the workorder rows with null ScrapReasonIDs. The current index uses 109 pages.

The following script re-creates the index with a WHERE clause that excludes all the null values:

DROP INDEX Production.WorkOrder.IX_WorkOrder_ScrapReasonID

CREATE INDEX IX_WorkOrder_ScrapReasonID
 ON Production.WorkOrder(ScrapReasonID)
 WHERE ScrapReasonID IS NOT NULL

The new index uses only two pages. Interestingly, the difference isn't noticeable between using the filtered or nonfiltered index when selecting all the work orders with a scrap reason that's not null. This is because there aren't enough intermediate levels to make a significant difference. For a much larger table, the difference would be worth testing, and most likely the filtered index would provide a benefit.

Best Practice

When designing a covering index (see Query Path #6) to solve a specific query — probably one that is in the top handful of CPU duration according to the indexing strategy — if the covering index works with a relatively small subset of data, and the overall table is a large table, consider using a filtering covering index.

Another situation that might benefit from filtered indexes is building a unique index that includes multiple rows with null values. A normal unique index enables only a single row to include a null value in the key columns. However, building a unique index that excludes null in the WHERE clause creates a unique index that permits an unlimited number of null values.

In a sense, SQL Server has had filtered indexes since SQL Server 2000 with indexed views. There's no reason why an indexed view couldn't have included a WHERE clause and included data from a filtered set of rows. But filtered indexes are certainly easier to create than indexed views, and they function as normal nonclustered indexes — which is an excellent segue into the next topic, indexed views.

2. Indexed Views

When a denormalized and pre-aggregated data solution needs to be in real time, an excellent alternative to querying the base tables includes using indexed views. Indexed views are “materialized” in that when the base table is updated the index for the view is also updated. This stores pre-aggregated or deformalized data without using special programming methods to do so.

Instead of building tables to duplicate data and denormalize a join, a view can be created that can join the two original tables and include the two source primary keys and all the columns required to select the data. Building a clustered index on the view physically materializes every column in the select column list of the view.

Numerous restrictions exist on indexed views, including the following:

The ANSI null and quoted identifier must be enabled when the base tables are created, when the view is created, and when any connection attempts to modify any data in the base tables.
The index must be a unique clustered index; therefore, the view must produce a unique set of rows without using distinct.
The tables in the view must be tables (not nested views) in the local database and must be referenced by means of the two-part name (schema.table).
The view must be created with the option with schema binding.

As an example of an indexed view used to denormalize a large query, the following view selects data from the Product, WorkOrder, and ScrapReason tables to produce a view of scrapped products:

USE AdventureWorks2012
SET ANSI_Nulls ON;
SET ANSI_Padding ON;
SET ANSI_Warnings ON;
SET ArithAbort ON;
SET Concat_Null_Yields_Null ON;
SET Quoted_Identifier ON;
SET Numeric_RoundAbort OFF;

GO

CREATE VIEW vScrap
WITH SCHEMABINDING
AS
 SELECT WorkOrderID, P.Name AS Product, 
  P.ProductNumber, 
   S.Name AS ScrappedReason, ScrappedQty
  FROM Production.WorkOrder W
   JOIN Production.Product P
    ON P.ProductID = W.ProductID 
   JOIN Production.ScrapReason S
    ON W.ScrapReasonID = S.ScrapReasonID

With the view in place, the index can now be created on the view, resulting in an indexed view:

CREATE UNIQUE CLUSTERED INDEX ivScrap
 ON vScrap  (WorkOrderID, Product, ProductNumber, 
   ScrappedReason, ScrappedQty) ;

Indexed views can also be listed and created in Management Studio under the Views → Indexes node.

To drop an indexed view, the drop statement must refer to the view instead of to a table:

DROP INDEX ivscrap ON dbo.vScrap

Dropping the view automatically drops the indexed view created from the view.

Indexed Views and Queries

When SQL Server's Query Optimizer develops the execution plan for a query, it includes the indexed view's clustered index as one of the indexes it can use for the query, even if the query doesn't explicitly reference the view. This happens only with the Enterprise Edition.

This means that the indexed view's clustered index can serve to speed up queries. When the Query Optimizer selects the indexed view's clustered index, the query execution plan indicates the index used. Both of the following queries use the indexed view:

SELECT WorkOrderID, P.Name AS Product, 
 P.ProductNumber, 
  S.Name AS ScrappedReason, ScrappedQty
 FROM Production.WorkOrder W
  JOIN Production.Product P
   ON P.ProductID = W.ProductID 
  JOIN Production.ScrapReason S
   ON W.ScrapReasonID = S.ScrapReasonID

SELECT * FROM vScrap

Although indexed views are essentially the same as they were in SQL Server 2000, the Query Optimizer can now use indexed views with more types of queries.

Updating Indexed Views

As with any denormalized copy of the data, the difficulty is keeping the data current. Indexed views have the same issue. As data in the underlying base tables is updated, the indexed view must be kept in sync. This process is completely transparent to the user and is more of a performance consideration than a programmatic issue.

3. The Columnstore Index

New to SQL Server 2012 is the Columnstore index, which is structured and behaves differently than traditional B-tree indexes. With B-tree indexes, column values are stored in rows in a page. However, with Columnstore indexes, the column values are stored together in segments, which allows for incredible data compression while dramatically increasing speed of processing for scanning huge amounts of data.

The major benefit for Columnstore indexes are for DataWarehouse style queries where a large Fact table is joined with smaller Dimension tables and data is scanned to produce the wanted result. When a Columnstore index is added to a table, it causes the table to be readonly until the index is dropped, which is also ideal for DataWarehouse tables because they are often updated infrequently and at defined intervals.

Consider the following script to create a Columnstore index on the dbo.FactFinance table in the AdventureWorks DataWarehouse sample database. The syntax should be familiar; it is practically the same as creating a nonclustered index. The only different is that in a traditional NC index, you only include the columns that you use in the index, and you need to be careful about the order in which you define those columns. However, in a Columnstore index the recommendation is to include every column in the table, the order in which you define the columns in the index does not matter.

CREATE NONCLUSTERED COLUMNSTORE INDEX idx_CS_Finance
ON dbo.FactFinance
(
FinanceKey, DateKey, OrganizationKey, DepartmentGroupKey, 
ScenarioKey, AccountKey, Amount, Date
)

Others

- SQL Server 2012 : A Comprehensive Indexing Strategy

- SQL Server 2012 : The Path of the Query (part 5) - Filter by Unordered Composite Index, Non-SARG-Able Expressions

- SQL Server 2012 : The Path of the Query (part 4) - Filter by 2 x NC Indexes, Filter by Ordered Composite Index

- SQL Server 2012 : The Path of the Query (part 3) - Bookmark Lookup

- SQL Server 2012 : The Path of the Query (part 2) - Range Seek Query, Filter by Nonkey Column

- SQL Server 2012 : The Path of the Query (part 1) - Fetch All, Clustered Index Seek

- SQL Server 2012 : Indexing Basics (part 2) - Index Selectivity, Query Operators

- SQL Server 2012 : Indexing Basics (part 1) - The B-Tree Index, Clustered Indexes, Nonclustered Indexes

- Windows 7 : Using Internet Explorer 8 - Using Multimedia Browsing and Downloading (part 3)

- Windows 7 : Using Internet Explorer 8 - Using Multimedia Browsing and Downloading (part 2)