Before
covering specific file configuration recommendations, let's address
some of the terms used when discussing database files:
Primary data file—The primary data file, and by default the only
data file, contains system tables and information on all files within a
database. By default, this file has an .mdf extension. If there are no
other files in the database, the primary file also contains user
objects such as tables and indexes.
Secondary data file—Secondary
files, which usually have an .ndf extension, are optional files that
can be added to a database for performance and/or administrative
benefits, both of which we'll cover shortly. A database can contain one
or more secondary files.
Filegroups—Every
database contains a primary filegroup, containing at least the primary
data file, and possibly all secondary data files unless other
filegroups are created and used. Filegroups are logical containers that
group together one or more data files, provide several benefits.
Transaction log file—Typically
using the .ldf extension, the transaction log file records details of
each database modification and is used for various purposes, including
transaction log shipping, replication, database mirroring, and recovery
of a database to a consistent state.
With
these terms in mind, let's cover some of the major file configuration
recommendations, starting with separating a database's different
storage objects across separate physical disk volumes.
1. Volume separation
By
default, a database is created with a single data and transaction log
file. Unless specified during installation or modified during database
creation, both of these files will be created in the same directory,
with the default size and growth rates inherited from the model
database.
As shown in figure 1,
an important database file configuration task, particularly for
databases with direct-attached storage, is to provide separate physical
RAID-protected disk volumes for data, transaction log, tempdb, and
backup files.
Designing SAN-based virtualized storage is quite different from
designing direct-attached storage; that being said, the principles of
high performance and fault tolerance remain. In both cases, a good
understanding of SQL Server's various storage objects is crucial in
designing an appropriate storage system. Let's walk through these now,
beginning with the transaction log file.
Transaction log file
Unlike
random access to data files, transaction logs are written sequentially.
If a disk is dedicated to a single database's transaction log, the disk
heads can stay in position writing sequentially, thus increasing
transaction throughput. In contrast, a disk that stores a combination
of data and transaction logs won't achieve the same levels of
throughput given that the disk heads will be moving between the
conflicting requirements of random data access/updates and sequential
transaction log entries. For database applications with high
transaction rates, separation of data and transaction logs in this
manner is crucial.
Backup files
A
common (and recommended) backup technique, is to back up databases to disk files and archive the
disk backup files to tape at a later point in the day. The most optimal
method for doing this is to have dedicated disk(s) for the purpose of
storing backups. Dedicated backup disks provide several benefits:
Disk protection—Consider a case where the database files and the backup files are on the same disk. Should the disk fail, both the database and
the backups are lost, a disastrous situation! Storing backups on
separate disk(s) prevents this situation from occurring—either the
database or the backups will be available.
Increased throughput—Substantial
performance gains come from multiple disks working in unison. During
backup, the disks storing the database data files are dedicated to
reading the files, and the backup disks are dedicated to writing backup
file(s). In contrast, having both the data and backup files on the same
disk will substantially slow the backup process.
Cost-effective—The
backup disks may be lower-cost, higher-capacity SATA disks, with the
data disks being more expensive, RAID-protected SCSI or SAS disks.
Containing growth—The
last thing you want is a situation where a backup consumes all the
space on the data disk, effectively stopping the database from being
used. Having dedicated backup disks prevents this problem from
occurring.
Tempdb database
Depending
on the database usage profile, the tempdb database may come in for
intense and sustained usage. By providing dedicated disks for tempdb,
the impact on other databases will be reduced while increasing
performance for databases heavily reliant on it.
Windows system and program files
SQL
data files shouldn't be located on the same disks as Windows system and
program files. The best way of ensuring this is to provide dedicated
disks for SQL Server data, log, backups and tempdb.
A
frequently cited reason for not creating dedicated disk volumes for the
objects we've covered so far is the lack of available drive letters,
particularly in clustered servers used for consolidating a large number
of databases and/or database instances. Mount points
address this problem by allowing a physically separate disk volume to
be grafted onto an existing volume, therefore enabling a single drive
letter to contain multiple physically separate volumes. Mount points
are fully supported in Windows Server 2003 and 2008.
|
For
small databases with low usage, storing everything on a single disk may
work perfectly fine, but as the usage and database size increases, file
separation is a crucial configuration step in ensuring the ongoing
performance and stability of database servers.
In
addition to increasing throughput, creating physically separate storage
volumes enables I/O bottlenecks to be spotted much more easily,
particularly with the introduction of the new Activity Monitor.
As
with object separation across physically separate disk volumes, using
multiple data files isn't a default setting, yet deserves consideration
given its various advantages.