A good server design is
one that has no, or very few, single points of failure. Among the most
common server components that fail are disks. Contributing factors
include heat and vibration. A good data center design recognizes this
and takes measures to reduce failure rates, such as locating servers in
temperature-controlled positions with low vibration levels.
Despite environmental
precautions, disks still fail, and the server design needs to take this
fact into consideration. The commonly used method for providing
protection against disk failure is Redundant Array of Independent/Inexpensive Disks, or RAID.
In addition to
providing tolerance against disk failure, certain RAID levels increase
performance by striping data across multiple disks and therefore
distributing I/O load among the disks in the RAID volume. The stripe
size, specified when building the RAID volume, can make a significant
difference in I/O performance.
This section will look at
four RAID levels and their advantages and disadvantages from a cost,
fault protection, and performance perspective. Note that there are more
RAID levels than discussed here, but these are the common ones used for
SQL Server implementations. As with the previous section, this section
is geared toward RAID in DAS solutions. SAN-based V-RAID is quite
different, although there's usually some correlation between V-RAID and
traditional RAID, so the principles are still important.
1. RAID 0
Despite the name, RAID 0, as shown in figure 1,
actually provides no redundancy at all. It involves striping data
across all the disks in the RAID array, which improves performance, but
if any of the disks in the array fail, then the whole array fails. In that sense, RAID 0 actually increases the chance of failure. Consider RAID 0 as the zero redundancy RAID.
Some have suggested
that RAID 0 may be acceptable for the tempdb database, given that
tempdb starts out empty every time SQL Server is restarted and therefore
redundancy of tempdb isn't really important. Although this is true,
it's also true that a failure in any of the tempdb disks will cause SQL
Server to fail, and you're then faced with rebuilding the disks before
SQL Server can be restarted. For most sites, this would lead to an
unacceptable outage.
While RAID 0
increases I/O performance through striping, due to the lack of
redundancy it provides, I don't recommend you use it for any serious SQL
Server implementation.
2. RAID 1
RAID 1, as shown in figure 2,
is essentially disk mirroring. Each disk in a RAID 1 array has a mirror
partner, and if one of the disks in a mirrored pair fails, then the
other disk is still available and operations continue without any data
loss.
Useful for a variety of SQL
Server components, including backups and transaction logs, RAID 1
arrays provide good read performance, and write performance suffers
little or no overhead.
The downside to RAID 1
is the lower disk utilization. For every usable disk, two disks are
required, resulting in a 50 percent utilization level.
3. RAID 5
RAID 5, as shown in figure 3,
requires at least three disks. It addresses the low disk utilization
inherent with RAID 1 by using parity to provide redundancy rather than
storing a duplicate copy of the data on another disk. When a disk
failure occurs in a RAID 5 array, the data stored on that disk is
dynamically recovered using the parity information on the remaining
disks.
Disk utilization in RAID 5 is calculated as # of drives-1/# of drives.
For three disk volumes, the utilization is 66 percent, for five disk
volumes, 80 percent, and so forth. RAID 5's main advantage is higher
disk utilization than RAID 1, and therefore a lower overall storage
cost; however, the downsides are significant. Each write to a RAID 5
array involves multiple disk operations for parity calculation and
storage; therefore, the write performance is much lower than other RAID
solutions. Further, in the event of a disk failure, read performance is
also degraded significantly.
Such overhead makes RAID 5
unsuitable for a lot of SQL Server implementations. Exceptions include
installations with either predominantly read-only profiles or those with
disk capacity or budgetary constraints that can handle the write
overhead.
4. RAID 10
RAID 10 combines the
best features of RAID 1 and 0, without any of the downsides of RAID 5.
Also known as RAID 1+0, RAID 10 is the highest performance RAID option.
As shown in figure 4,
RAID 10 offers the high-performance striping of RAID 0 with the fault
tolerance of RAID 1's disk mirroring without any of the write overhead
of RAID 5.
The downside of RAID
10 is the cost. Requiring at least four disks, RAID 10 arrays benefit
from lots of disks to stripe across, each of which requires a mirror
partner. In large deployments, the cost of RAID 10 may be prohibitive
for some organizations, with the money perhaps better spent on other
infrastructure components.
RAID 10 offers the
most advantages to SQL Server and, despite the cost, should be
seriously considered for environments requiring both high performance
and fault tolerance. Table 1 compares RAID 10 with RAID 0, 1, and 5.
Table 1. RAID level comparisons
Attribute | RAID 0 | RAID 1 | RAID 5 | RAID 10 |
---|
Disk failure tolerance | 0 | >=1 | 1 | >=1 |
Disk utilization % | 100% | 50% | 66%+ | 50% |
Read performance | High | High | High | High |
Write performance | High | Medium | Low | Medium |
SQL Server suitability | Bad | Good | Limited | Good |
Finally, it's worth
mentioning that RAID can be implemented at either the software level,
via the Windows operating system, or the hardware level, using dedicated
RAID controller cards. Software RAID shouldn't be used in server class
implementations as doing so consumes operating system resources and
doesn't offer the same feature set as hardware implementations. Hardware
RAID requires no operating system resources and provides additional
benefits, such as more RAID-level options, battery- backed disk cache, and support for swapping out failed disks without bringing down the system.
Let's move on now and cover the different types of storage systems available today.