1. The Importance of Hardware
The underlying foundation of
SQL Server 2012 performance and scalability is the actual hardware and
storage subsystem on which your instance of SQL Server 2012 is running.
This is true whether you are running in a virtualized environment or in
a bare metal configuration. Regardless of what type of database
workload you may have to deal with, and irrespective of how well
designed and optimized your databases are, the characteristics and
performance of your database hardware and storage subsystem are
extremely important. Even the most well-designed and carefully tuned
database application can be crippled by poorly chosen or inadequate
hardware. This is not to say that hardware can solve all performance or
scalability problems. A frequently executed, expensive query on an
extremely large dataset can quickly overwhelm even the best hardware
and storage subsystem. Despite this, having modern, properly sized
hardware and a good storage subsystem gives you a much better chance of
being able to handle any type of workload that you may see on SQL
Server 2012, and makes your life as a DBA much easier!
Unfortunately, far too many database
administrators (DBAs) are blissfully ignorant about the important
details regarding their database hardware infrastructure. Given the
pace of recent and ongoing advances in new processors and chipsets,
along with changes in both magnetic and flash storage, trying to stay
current with hardware technology can be daunting. Many DBAs simply give
up, and let someone else make all the hardware and storage decisions.
No matter who makes these decisions, however, the DBA is usually blamed
for any performance or scalability issues that show up later. Even if
you don’t get to make the final decisions regarding hardware selection,
being knowledgeable and informed about server hardware puts you in a
much stronger position during the decision-making process. Being
educated about database hardware also helps you understand whether your
existing hardware and storage subsystem is woefully underpowered by
today’s standards, which is extremely valuable information for a DBA.
2. How Workload Affects Hardware and Storage Considerations
If you are ready to accept the
challenge of learning some of the mysteries of database server hardware
and storage, where should you begin? The first step is to have a good
understanding of your current or planned workload. You need to know
whether your database server will be running only the actual SQL Server
Database Engine, or also other SQL Server components such as SQL Server
Analysis Services (SSAS), SQL Server Integration Services (SSIS), or
SQL Server Reporting Services (SSRS). Ideally, you would want these
other SQL Server components running on separate dedicated servers, but
you might not have that luxury because of the extra hardware and
licensing costs. Even if you are only going to be running the Database
Engine on your database server, you need to understand what kind of
workload you will be handling.
Workload Types
Several different types of workload are
common with SQL Server, or any other relational database management
server (RDBMS), including online transaction processing (OLTP), data
warehousing (DW), relational reporting, and online analytical
processing (OLAP). Depending on your applications and what SQL Server
components are running on your database server, you might have a
relatively pure version of one of these workload types or a mixture of
several.
Other variables include the number of user
databases running on your database instance, and the volume and
intensity of your workload — that is, how many batch requests per
second, how many new rows are inserted or updated per second, and so
on. All these different variables affect your hardware selection
decisions, and how you decide to configure your hardware and storage
subsystem to get the best performance possible for that type of
workload.
OLTP Workloads
One extreme is a pure OLTP workload,
which is typically characterized by numerous short-duration queries and
transactions with a relatively high percentage of write activity.
Processors with higher base clock speeds and higher turbo speeds
(within the same processor family) tend to perform better on most OLTP
queries. A pure OLTP workload usually has a high degree of data
volatility, especially in some of the database’s key tables. Having a
pure OLTP workload will influence your hardware options and how you
configure your hardware and storage subsystem. These workloads generate
more input/output (I/O) operations per second (IOPS) than an equivalent
data warehouse (DW) system.
With a single OLTP database, you will see mostly
sequential write activity to your transaction log file, and more random
write activity to your data file(s). If you have more than one OLTP
database on your instance of SQL Server, and the transaction log files
for these databases are located on the same drive array, you will see
more random write activity because the drive array is forced to service
all the transaction log files for multiple OLTP databases. If you are
using technologies such as SQL Server transactional replication,
database mirroring, or AlwaysOn availability groups, you will also see
sequential read activity against your transaction log file(s).
Data Warehousing Workloads
Another completely different type of
workload is a pure DW workload, which has long-running, complex queries
that are often parallelized by the Query Optimizer; this places a
premium on having processors with higher physical core counts and
better memory controllers in order to execute these types of queries as
quickly as possible. Also very important for DW workloads is having a
large amount of memory to ensure you have adequate room for the buffer
pool.
A DW workload has more sequential reads from your
data files and very little write activity to your data files and log
file during normal operations. During data loads, you will see
predominantly sequential write activity to your transaction log file
and a combination of sequential and random write activity to your data
files. You want to consider sequential read and write performance as
you select and configure your I/O subsystem for a DW workload.
Relational Reporting Workloads
Many organizations maintain a second
copy of an OLTP database for reporting usage. This is ideally located
on a dedicated server that is separate from the primary OLTP database
server. This “reporting” database will have many additional
nonclustered indexes added to the existing OLTP tables and it may also
have additional reporting tables containing calculated summary data for
reporting purposes.
In some cases, this reporting database is
restored from a backup of the production OLTP database, perhaps once a
day. After the restore is finished, all the additional nonclustered
indexes are created and the reporting tables are loaded and indexed. In
terms of sequential read and write performance, this type of pattern
places a lot of stress on the I/O subsystem. Restoring a database from
a backup and creating many new indexes is a sequential operation, so
having a lot of sequential I/O performance is very important. After the
reporting database is ready for use, the overall workload becomes very
similar to a DW workload. If you have this type of pattern, you should
consider using the new columnstore index feature in SQL Server 2012.
Another scenario for a relational reporting
database is to use transactional replication between the production
OLTP database, which acts as a publisher, to the “reporting” database,
which acts as a subscriber. Usually, many additional nonclustered
indexes are added to the subscriber to improve query performance for
reporting queries. Maintaining acceptable INSERT, UPDATE, and DELETE
performance in this database is more difficult because of these
additional indexes. This places more stress on your I/O subsystem, so
you will see sequential writes to the log file and random writes to the
data files. The reporting queries cause sequential reads from the data
files. Overall, this is a relatively challenging mixed workload type.
OLAP Workloads
OLAP workloads have several different
components, including reading data from the source(s) to initially
build or update the cube, processing the cube when changes are made,
and then actually running various types of OLAP queries to retrieve the
data for users. Having processors with higher physical core counts,
with better memory controllers in order to execute these types of
queries as quickly as possible, is very valuable. Also very important
for OLAP workloads is having a large amount of memory so that you can
process large cubes quickly. OLAP workloads tend to have a lot of
random I/O, so flash-based storage for the cube files
can be very beneficial. Flash-based storage includes solid-state drives
(SSDs) and other devices such as Fusion-io cards that use solid-state
flash memory for permanent storage. These types of devices offer
extremely high random I/O performance, which is very useful for OLAP
workloads.
Server Model Selection
In order to choose an appropriate
server model for your database server, you must first decide whether
you want to use an Intel processor or an AMD processor, as this
absolutely dictates which server models you can consider from your
system vendor. Next, you need to decide whether you will be using a
one-socket, two-socket, or four-socket database server, or something
even larger, as that constrains your available processor options. You
also have to decide what vertical form factor you want for the server —
that is, whether it will be a 1U, 2U, 4U, or even larger server. These
designations, (1U, 2U, etc.) refer to how tall the server is in rack
units, with a rack unit being roughly 1.75 inches tall. This affects
how many servers will fit in a rack, and how many internal drive bays
will fit inside a rack-mounted server.
These choices also affect the maximum amount of
physical memory (RAM) that you can have, the number of Peripheral
Component Interconnect Express (PCIe) expansion slots that are
available, and the number of internal drive bays that are available in
the server.
Here are some things to consider as you decide
whether to purchase a two-socket database server or a four-socket
database server. Traditionally, it was very common to use a four-socket
machine for most database server scenarios, while two-socket servers
were most often used for web servers or application servers. However,
given recent advances in processors, improvements in memory density,
and the increase in the number and bandwidth of PCIe expansion slots
over the past several years, you might want to seriously reconsider
that conventional wisdom.
Historically, two-socket database servers did not
have enough processor capacity, memory capacity, or I/O capacity to
handle most intense database workloads. Processors have become far more
powerful in the last few years, and memory density has increased
dramatically. It is also possible to achieve much more I/O capacity
connected to a two-socket server than it was a few years ago,
especially with the latest processors and chipsets that have PCIe 3.0
support.
Another reason to carefully consider this issue
is the cost of SQL Server 2012 Enterprise Edition processor core
licenses. If you can run your workload on a two-socket server instead
of a four-socket server, you could save up to 50% on your SQL Server
processor core license costs, which can be a very substantial savings!
With SQL Server 2012 Enterprise Edition, the cost of a few processor
core licenses would pay for a very capable two-socket database server
(exclusive of the I/O subsystem).
Server Model Evolution
To provide some history and context,
this section describes how the capabilities and performance of
commodity two- and four-socket servers have changed over the past seven
years. In 2005, you could buy a two-socket Dell PowerEdge 1850 with two
hyperthreaded Intel Xeon “Irwindale” 3.2GHz processors and 12GB of RAM
(with a total of four logical cores). This was fine for an application
or web server, but it really didn’t have the CPU horsepower (the
Geekbench score was about 2200) or memory capacity for a heavy-duty
database workload . This model server had relatively few expansion slots, with
either two PCI-X or two PCIe 1.0 slots being available.
By early 2006, you could buy a four-socket Dell
PowerEdge 6850 with four dual-core, Intel Xeon 7040 “Paxville” 3.0GHz
processors and up to 64GB of RAM (with a total of 16 logical cores with
hyperthreading enabled). This was a much better choice for a database
server at the time because of the additional processor, memory, and I/O
capacity compared to a PowerEdge 1850. Even so, its Geekbench score was
only about 4400, which is pretty pathetic by today’s standards, even
compared to a new Core i3–2350M entry-level laptop. In 2005 and 2006,
it still made sense to buy a four-socket database server for most
database server workloads because two socket servers simply were not
powerful enough in terms of CPU, memory, or I/O.
By late 2007, you could buy a two-socket Dell
PowerEdge 1950 with two, quad-core Intel Xeon E5450 processors and 32GB
of RAM (with a total of eight logical cores), which provided a
relatively powerful platform for a small database server. The Intel
Xeon 5400 series did not have hyperthreading. A system like this would
have a Geekbench score of about 8000. With only two PCIe 1.0 × 8 slots
it had limited external I/O capability, but the gap compared to four
socket servers was beginning to narrow.
In late 2008, you could get a four-socket Dell
PowerEdge R900 with four, six-core Intel Xeon X7460 processors and
256GB of RAM (with a total of 24 logical cores). This system had seven
PCIe 1.0 expansion slots, divided into four × 8 and three × 4 slots.
(The × 4 and × 8 refer to the number of lanes. The more lanes, the
higher the maximum bandwidth.) This was a very powerful but costly
platform for a database server, with a Geekbench score of around
16,500. This was the last generation of Intel Xeon processors to use a
symmetrical multiprocessing (SMP) architecture, rather than a
non-uniform memory access (NUMA) architecture, so it did not scale very
well when additional processor sockets were added to servers. The Intel
Xeon 7400 series did not have hyperthreading. Many four-socket servers
of this vintage are still in use today, even though their performance
and scalability has long been eclipsed by modern two-socket servers.
By early 2009, you could get a two-socket Dell
PowerEdge R710 with two, quad-core Intel Xeon X5570 processors, and
144GB of RAM (with a total of 16 logical cores with hyperthreading
enabled). This system had four PCIe 2.0 expansion slots, divided into
two × 8 and two × 4 slots. This provided a very powerful database
server platform in a very compact package. Such a system would have a
Geekbench score of around 15,000. It used the 45nm Nehalem-EP family
processor, which had NUMA support. This was when the tide began to turn
in favor of two-socket servers instead of four-socket servers, as this
system had enough CPU, memory, and I/O capacity to compare favorably
with existing four-socket servers. If you were concerned about 144GB of
RAM not being enough memory in the R710, you could buy two R710s,
nearly doubling the CPU capacity and the I/O capacity of a single R900.
This assumes that you could split your database workload between two
database servers, by moving databases or doing something such as
vertical or horizontal partitioning of an existing large database.
By early 2011, you could buy that same Dell
PowerEdge R710 with more powerful six-core 32nm Intel Xeon X5690
processors and up to 288GB of RAM (with a total of 24 logical cores
with hyperthreading enabled), and push the Geekbench score to about
24,000. This gives you quite a bit more CPU capacity and memory than
the PowerEdge R900 that you could buy in late 2008. An R710 with those
processors would give you the absolute best single-threaded OLTP
performance available until March 2012, when the Dell R720 with the
32nm Xeon E5–2690 became available.
In March of 2012, you could purchase a two-socket
Dell PowerEdge R720 with two, eight-core 32nm Intel Xeon E5–2690
processors and up to 768GB of RAM (with 32GB DIMMs) and seven PCIe 3.0
expansion slots, split between six × 8 and one × 16 slots. This
provides a total of 32 logical cores (with hyperthreading enabled)
visible to Windows, and this system has a Geekbench score of about
41,000, a significant improvement over the previous generation R710
server. It also has more memory capacity, better memory bandwidth, and
much more I/O capacity due to the higher number of improved PCIe 3.0
expansion slots. This two-socket system has a Geekbench score that is
roughly comparable to a 2011 vintage four-socket Dell PowerEdge R910
server that is using the 32nm Xeon E7–4870 processor. We now have a
two-socket server that compares extremely well with the latest model
four-socket servers in nearly every respect.
This overall trend has been continuing over the
past several years, with Intel introducing new processors in the
two-socket space about 12–18 months ahead of introducing a roughly
equivalent new processor in the four-socket space. This means that you
will get much better single-threaded OLTP performance from a two-socket
system than from a four-socket system of the same age (as long as your
I/O subsystem is up to par). The latest model two-socket servers with
the Sandy Bridge-EP Intel Xeon E5–2690 processor compare very favorably
to four-socket servers with the Sandy Bridge-EP Intel Xeon E5–4650, and
even more favorably to four-socket servers with the older Westmere-EX
Intel Xeon E7–4870 for all but the largest workloads.
Given the choice, two, two-socket machines
instead of one, four-socket machine would be preferable in almost all
cases. The only major exception would be a case in which you absolutely
needed far more memory in a single server than you can get in a
two-socket machine (a Dell PowerEdge R720 can now handle up to 768GB if
you are willing to pay for 32GB DIMMs) and you are unable to do any
reengineering to split up your workload.
From a SQL Server 2012 licensing
perspective, a fully loaded Dell R720 is much more affordable than a
fully loaded Dell R910, as we are talking about 16 physical cores for
the R720 vs. 40 physical cores for the R910. At the time of writing,
the full retail cost of 16 processor core licenses for SQL Server 2012
Enterprise Edition would be $109,984, whereas the retail cost for 40
processor core licenses would be $274,960. This means that you could
buy two very well equipped R720 servers and their required SQL Server
licenses for significantly less money than the cost of a single
well-equipped R910 and its required SQL Server licenses. If you can
split your workload between two servers, you would get much better performance and scalability from two R720 servers compared to a single R910 server.