SQL Server 2008 R2 : Database Filegroups and Performance, SQL Server and SAN Technology

8/24/2013 9:40:23 AM

1. Database Filegroups and Performance

Filegroups allow you to decide where on disk a particular object should be placed. You can do this by defining a filegroup within a database, extending the database onto a different drive or set of drives, and then placing a database object on the new filegroup.

Every database, by default, has a primary filegroup that contains the primary data file. There can be only one primary filegroup. This primary filegroup contains all the pages assigned to system tables. It also contains any additional database files created without specifying filegroup. Initially, the primary filegroup is also the default file group. There can be only one default filegroup, and indexes and tables that are created without specifying a filegroup are placed in the default filegroup. You can change the default filegroup to another filegroup after it has been created for a database.

In addition to the primary filegroup, you can add one or more additional filegroups to the database that are named user-defined filegroups. Each of those filegroups can contain one or more files. The main purpose of using filegroups is to provide more control over the placement of files and data on the server. When you create a table or an index, you can map it to a specific filegroup, thus controlling the placement of data. A typical SQL Server database installation generally uses a single RAID array to spread I/O across disks and create all files in the primary filegroup; more advanced installations or installations with very large databases spread across multiple array sets can benefit from the finer level of control of file and data placement afforded by additional filegroups.

For example, for a simple database such as AdventureWorks2008, you can create just one primary file that contains all data and objects and a log file that contains the transaction log information. For a larger and more complex database, such as a securities trading system, where large data volumes and strict performance criteria are the norm, you might create the database with one primary file and four secondary files. You can then set up filegroups so you can place the data and objects within the database across all five files. If you have a table that itself needs to be spread across multiple disk arrays for performance reasons, you can place multiple files in a filegroup, each of which resides on a different disk, and create the table on that filegroup. For example, you can create three files (Data1.ndf, Data2.ndf, and Data3.ndf) on three disk arrays and then assign them to the filegroup called spread_group. Your table can then be created specifically on the spread_group filegroup. Queries for data from the table are then spread across the three disk arrays, thereby improving I/O performance.

Filegroups are most often used in high-performance environments to isolate key tables or indexes on their own set of disks, which are in turn typically part of a high-performance RAID array. Assuming that you start with a database that has just a PRIMARY filegroup (the default), the following example shows how you would add an index filegroup on a new drive and move some nonclustered indexes to it:

-- add the filegroup
alter database Grocer
      add filegroup FG_INDEX

-- Create a new database file and add it to the FG_INDEX filegroup
alter database Grocer
add file(
    NAME = Grocer_Index,
       FILENAME = 'g:\Grocer_Index.ndf',
       SIZE = 2048MB,
       MAXSIZE = 8192MB,
       FILEGROWTH = 10%
) to filegroup FG_INDEX

create nonclustered index xOrderDetail_ScanDT
    on OrderDetail(ScanDT)
    on FG_INDEX

Moving the indexes to a separate RAID array minimizes I/O contention by spreading out the I/O generated by updates to the data that affect data rows and require changes to index rows as well.

Note

Because the leaf level of a clustered index is the data page, if you create a clustered index on a filegroup, the entire table moves from the existing filegroup to the new filegroup. If you want to put indexes on a separate filegroup, you should reserve that space for nonclustered indexes only.

Having your indexes on a separate filegroup gives you the following advantages:

Index scans and index page reads come from a separate disk, so they need not compete with other database processes for disk time.
Inserts, updates, and deletes on the table are spread across two separate disk arrays. The clustered index, including all the table data, is on a separate array from the nonclustered indexes.
You can target your budget dollars more precisely because the faster disks improve system performance more if they are given to the index filegroup rather than the database as a whole.

The next section gives specific recommendations on how to architect a hardware solution based on using separate filegroups for data and indexes.

2. SQL Server and SAN Technology

With to the increased use of storage area networks (SANs) in SQL Server environments, it is important to understand the design and performance implications of implementing SQL Server databases on SANs. SANs are becoming increasingly more common in SQL Server environments these days for a number of reasons:

Increasing database sizes
The increasing prevalence of clustered environments
The performance advantages and storage efficiencies and flexibilities of SANs
The increasing needs of recoverability and disaster recovery
Simplified disk administration

In large enterprises, a SAN can be used to connect multiple servers to a centralized pool of disk storage. Compared to managing hundreds of servers, each with its own separate disk arrays, SANs help simplify disk administration by treating all the company’s storage as a single resource. Disk allocation, maintenance, and routine backups are easier to manage, schedule, and control. In some SANs, the disks themselves can copy data to other disks for backup without any processing overhead at the host computers.

What Is a SAN?

A SAN contains multiple high-performance hard drives coupled with high-performance caching controllers. The hard drives are often configured into various RAID configurations. These drive configurations are virtualized so that the consumer does not know which hard drives a SQL Server or other device connected to the SAN will access. Essentially, the SAN presents blocks of storage to servers that can consist of a single hard drive, multiple hard drives, or portions of hard drives in a logical unit called a Logical Unit Number (LUN). Connection to a SAN is typically through fiber channel, a high-speed optical network.

SANS can provide advantages over locally attached storage. Most SANs provide features that allow you to clone, snapshot, or rapidly move data (replicate) from one location to another, much faster than file copies or data transfers over your network. This increases the usefulness of SANs for disaster recovery. SANs also provide a shared disk resource for building server clusters, even allowing a cluster or server to boot off a SAN.

Another reason for the increased use of SANs is that they offer increased utilization of storage. With locally attached storage, large amounts of disk space can end up being wasted. With a SAN, you can expand or contract the amount of disk space allocated to a server or cluster as needed.

Due to their cost and complexity, however, SANs are not for everybody. They only really make sense in large enterprises. They are not a good choice for small environments with relatively small databases, for companies with limited budgets (SANs are expensive), or for companies that require disaster recovery on only one or a few SQL Servers.

SAN Considerations for SQL Server

Before you rush out and purchase a SAN or two for your SQL Server environments, there are some considerations to keep in mind when using SANs with SQL Server.

Cache Performance

One of the reasons SANs can offer superior performance to locally attached storage is they typically are configured with a significant amount of cache space. This is normally a good thing. However, because the SAN provides storage services to multiple servers, the available cache space is shared as well. If there is significant activity against the SAN, there can be extensive cache turnover. This means that the large cache space may not always be available to SQL Server, so some of the performance gains provided by the large cache are not realized.

Note

Cache turnover in a SAN can lead to widely varying physical I/O response times. When SQL Server performs I/O against the SAN, it’s considered a physical I/O whether or not the data resides in the SAN cache. When the physical I/O performance for SQL Server is measured, the performance can be orders of magnitude faster when the data is residing in the SAN cache than when the data has to be physically read from the disks in the SAN. It is important that you perform benchmarking with your SAN vendor to ensure that your SAN cache will be adequate to provide optimal database performance.

Avoid Disk Drive Contention

SAN storage is divided into LUNs. Servers attached to the SAN recognize one or more of these units as a disk partition or drive. However, these LUNs may share the same disk drives. For example, consider six 100GB drives in the SAN. Theoretically, this could be divided into two LUNs of 300GB each. Although each LUN may be allocated to different SQL Servers, some of the drives shared between the two LUNs could experience twice the I/O from both servers than if the drives were dedicated to a single server. To avoid this situation, most SANs support zoning, which allows the SAN administrator to dedicate entire disks in the SAN to your LUN to isolate the I/O on the drives in the LUN to your SQL Server.

In addition, you should try to ensure that your database log files are on a LUN consisting of dedicated drives separate from the LUN (or LUNs) used for your SQL Server data files. Log files typically are written sequentially, unlike data files where data access tends to consist more of random reads and writes. Sharing a LUN between data files and log files generally does not provide optimal IO performance. Unfortunately, your SAN administrator may not permit you to dedicate a separate disk or set of disks to your log files. An alternative may be to place your log files on a local RAID 1 or RAID 10 array. However, you might want to benchmark to determine which solution provides better performance because the caching capabilities of the SAN may offset the potential drive contention in the SAN.

Additional SAN Performance Considerations

Some SAN administrators may attempt to convince you to use RAID 5 for all data and log files. Before following their advice, you should benchmark the system using a representative load to ensure that RAID 5 will offer the best performance for your log files, tempdb, and any write-intensive filegroups.

You should also ensure that the hardware your SQL Server system uses to connect to the SAN provides optimal performance. Make sure that you have the correct and most up-to-date drivers for your SAN components. If you can, consider using multiple high-speed host bus adapters (HBAs) to connect your servers to your SAN to avoid the I/O contention that can occur with a single HBA. If you do use multiple HBAs, try to ensure they are on different buses to prevent bus saturation and that the HBAs are plugged into the PCI slots offering the highest speed.

SANs are complex, and delivering optimal performance for a SQL Server solution using a SAN is challenging. Benchmark your SQL Server to determine if bottlenecks exist with your SAN. Be willing to work with your SAN administrator or vendor to fine-tune your SAN configuration and carefully consider and benchmark any recommendations they may make to ensure optimal performance.

Others

- SQL Server 2008 R2 : Database Design and Performance - Denormalizing a Database - Essential Denormalization Techniques

- SQL Server 2008 R2 : Basic Tenets of Designing for Performance, Logical Database Design Issues

- SQL Server 2008 R2 : Database Snapshots - Setting Up Snapshots Against a Database Mirror

- SQL Server 2008 R2 : Query Versus Update Performance , Identifying Missing Indexes

- Windows Server 2008 : Creating Basic Visual Basic Scripts - Using if Statements, Checking for a Value with a Message Box

- Windows Server 2008 : Creating Basic Visual Basic Scripts - Displaying a Message Box with a Visual Basic Script

- Windows Server 2008 : Creating Basic Visual Basic Scripts - Working with filesystemobject

- Windows 7 : Installing and Upgrading Programs - Common Installation Prompts (part 2) - Type of Installation

- Windows 7 : Installing and Upgrading Programs - Common Installation Prompts (part 1) - Compliance check , The End User License Agreement