1. Database Filegroups and Performance
Filegroups allow you to decide where on disk a
particular object should be placed. You can do this by defining a
filegroup within a database, extending the database onto a different
drive or set of drives, and then placing a database object on the new
filegroup.
Every database, by default, has a primary filegroup
that contains the primary data file. There can be only one primary
filegroup. This primary filegroup contains all the pages assigned to
system tables. It also contains any additional database files created
without specifying filegroup. Initially, the primary filegroup is also
the default file group. There can be only one default filegroup, and
indexes and tables that are created without specifying a filegroup are
placed in the default filegroup. You can change the default filegroup
to another filegroup after it has been created for a database.
In addition to the primary filegroup, you
can add one or more additional filegroups to the database that are
named user-defined filegroups. Each of those filegroups can contain one
or more files. The main purpose of using filegroups is to provide more
control over the placement of files and data on the server. When you
create a table or an index, you can map it to a specific filegroup,
thus controlling the placement of data. A typical SQL Server database
installation generally uses a single RAID array to spread I/O across
disks and create all files in the primary filegroup; more advanced
installations or installations with very large databases spread across
multiple array sets can benefit from the finer level of control of file
and data placement afforded by additional filegroups.
For example, for a simple database such as AdventureWorks2008,
you can create just one primary file that contains all data and objects
and a log file that contains the transaction log information. For a
larger and more complex database, such as a securities trading system,
where large data volumes and strict performance criteria are the norm,
you might create the database with one primary file and four secondary
files. You can then set up filegroups so you can place the data and
objects within the database across all five files. If you have a table
that itself needs to be spread across multiple disk arrays for
performance reasons, you can place multiple files in a filegroup, each
of which resides on a different disk, and create the table on that
filegroup. For example, you can create three files (Data1.ndf, Data2.ndf, and Data3.ndf) on three disk arrays and then assign them to the filegroup called spread_group. Your table can then be created specifically on the spread_group filegroup. Queries for data from the table are then spread across the three disk arrays, thereby improving I/O performance.
Filegroups are most often used in high-performance
environments to isolate key tables or indexes on their own set of
disks, which are in turn typically part of a high-performance RAID
array. Assuming that you start with a database that has just a PRIMARY
filegroup (the default), the following example shows how you would add
an index filegroup on a new drive and move some nonclustered indexes to
it:
-- add the filegroup
alter database Grocer
add filegroup FG_INDEX
-- Create a new database file and add it to the FG_INDEX filegroup
alter database Grocer
add file(
NAME = Grocer_Index,
FILENAME = 'g:\Grocer_Index.ndf',
SIZE = 2048MB,
MAXSIZE = 8192MB,
FILEGROWTH = 10%
) to filegroup FG_INDEX
create nonclustered index xOrderDetail_ScanDT
on OrderDetail(ScanDT)
on FG_INDEX
Moving
the indexes to a separate RAID array minimizes I/O contention by
spreading out the I/O generated by updates to the data that affect data
rows and require changes to index rows as well.
Note
Because the leaf level of a clustered index is the
data page, if you create a clustered index on a filegroup, the entire
table moves from the existing filegroup to the new filegroup. If you
want to put indexes on a separate filegroup, you should reserve that
space for nonclustered indexes only.
Having your indexes on a separate filegroup gives you the following advantages:
Index scans and index page reads come from a
separate disk, so they need not compete with other database processes
for disk time.
Inserts, updates, and
deletes on the table are spread across two separate disk arrays. The
clustered index, including all the table data, is on a separate array
from the nonclustered indexes.
You can
target your budget dollars more precisely because the faster disks
improve system performance more if they are given to the index
filegroup rather than the database as a whole.
The next section gives specific
recommendations on how to architect a hardware solution based on using
separate filegroups for data and indexes.
2. SQL Server and SAN Technology
With to the increased use of storage area networks
(SANs) in SQL Server environments, it is important to understand the
design and performance implications of implementing SQL Server
databases on SANs. SANs are becoming increasingly more common in SQL
Server environments these days for a number of reasons:
Increasing database sizes
The increasing prevalence of clustered environments
The performance advantages and storage efficiencies and flexibilities of SANs
The increasing needs of recoverability and disaster recovery
Simplified disk administration
In large enterprises, a SAN can be used to connect
multiple servers to a centralized pool of disk storage. Compared to
managing hundreds of servers, each with its own separate disk arrays,
SANs help simplify disk administration by treating all the company’s
storage as a single resource. Disk allocation, maintenance, and routine
backups are easier to manage, schedule, and control. In some SANs, the
disks themselves can copy data to other disks for backup without any
processing overhead at the host computers.
What Is a SAN?
A SAN contains multiple high-performance hard drives
coupled with high-performance caching controllers. The hard drives are
often configured into various RAID configurations. These drive
configurations are virtualized so that the consumer does not know which
hard drives a SQL Server or other device connected to the SAN will
access. Essentially, the SAN presents blocks of storage to servers that
can consist of a single hard drive, multiple hard drives, or portions
of hard drives in a logical unit called a Logical Unit Number (LUN).
Connection to a SAN is typically through fiber channel, a high-speed
optical network.
SANS can provide advantages over locally attached
storage. Most SANs provide features that allow you to clone, snapshot,
or rapidly move data (replicate) from one location to another, much
faster than file copies or data transfers over your network. This
increases the usefulness of SANs for disaster recovery. SANs also
provide a shared disk resource for building server clusters, even
allowing a cluster or server to boot off a SAN.
Another reason for the increased use of SANs is that
they offer increased utilization of storage. With locally attached
storage, large amounts of disk space can end up being wasted. With a
SAN, you can expand or contract the amount of disk space allocated to a
server or cluster as needed.
Due to their cost and complexity, however, SANs are
not for everybody. They only really make sense in large enterprises.
They are not a good choice for small environments with relatively small
databases, for companies with limited budgets (SANs are expensive), or
for companies that require disaster recovery on only one or a few SQL
Servers.
SAN Considerations for SQL Server
Before you rush out and purchase a SAN or two for
your SQL Server environments, there are some considerations to keep in
mind when using SANs with SQL Server.
Cache Performance
One
of the reasons SANs can offer superior performance to locally attached
storage is they typically are configured with a significant amount of
cache space. This is normally a good thing. However, because the SAN
provides storage services to multiple servers, the available cache
space is shared as well. If there is significant activity against the
SAN, there can be extensive cache turnover. This means that the large
cache space may not always be available to SQL Server, so some of the
performance gains provided by the large cache are not realized.
Note
Cache turnover in a SAN can lead to widely varying
physical I/O response times. When SQL Server performs I/O against the
SAN, it’s considered a physical I/O whether or not the data resides in
the SAN cache. When the physical I/O performance for SQL Server is
measured, the performance can be orders of magnitude faster when the
data is residing in the SAN cache than when the data has to be
physically read from the disks in the SAN. It is important that you
perform benchmarking with your SAN vendor to ensure that your SAN cache
will be adequate to provide optimal database performance.
Avoid Disk Drive Contention
SAN storage is divided into LUNs. Servers attached
to the SAN recognize one or more of these units as a disk partition or
drive. However, these LUNs may share the same disk drives. For example,
consider six 100GB drives in the SAN. Theoretically, this could be
divided into two LUNs of 300GB each. Although each LUN may be allocated
to different SQL Servers, some of the drives shared between the two
LUNs could experience twice the I/O from both servers than if the
drives were dedicated to a single server. To avoid this situation, most
SANs support zoning, which allows the
SAN administrator to dedicate entire disks in the SAN to your LUN to
isolate the I/O on the drives in the LUN to your SQL Server.
In addition, you should try to ensure that your
database log files are on a LUN consisting of dedicated drives separate
from the LUN (or LUNs) used for your SQL Server data files. Log files
typically are written sequentially, unlike data files where data access
tends to consist more of random reads and writes. Sharing a LUN between
data files and log files generally does not provide optimal IO
performance. Unfortunately, your SAN administrator may not permit you
to dedicate a separate disk or set of disks to your log files. An
alternative may be to place your log files on a local RAID 1 or RAID 10
array. However, you might want to benchmark to determine which solution
provides better performance because the caching capabilities of the SAN
may offset the potential drive contention in the SAN.
Additional SAN Performance Considerations
Some SAN administrators may attempt to convince you
to use RAID 5 for all data and log files. Before following their
advice, you should benchmark the system using a representative load to
ensure that RAID 5 will offer the best performance for your log files, tempdb, and any write-intensive filegroups.
You should also ensure that the hardware your SQL
Server system uses to connect to the SAN provides optimal performance.
Make sure that you have the correct and most up-to-date drivers for
your SAN components. If you can, consider using multiple high-speed
host bus adapters (HBAs) to connect your servers to your SAN to avoid
the I/O contention that can occur with a single HBA. If you do use
multiple HBAs, try to ensure they are on different buses to prevent bus
saturation and that the HBAs are plugged into the PCI slots offering
the highest speed.
SANs are complex, and delivering optimal
performance for a SQL Server solution using a SAN is challenging.
Benchmark your SQL Server to determine if bottlenecks exist with your
SAN. Be willing to work with your SAN administrator or vendor to
fine-tune your SAN configuration and carefully consider and benchmark
any recommendations they may make to ensure optimal performance.