The purpose of a database is to store
the information required by an organization. Any means of collecting
and organizing data is a database. Prior to the Information Age,
information was primarily stored on cards, in file folders, or in
ledger books. Before the adding machine, offices employed dozens of
workers who spent all day adding columns of numbers and double-checking
the math of others. The job title of those who had that exciting career
was computer.
Author's Note
Welcome to the second of three
chapters that deal with database design. Although they're spread out in
the table of contents, they weave a consistent theme that good design
yields great performance:
- Chapter 2 provides an overview of data architecture.
- Partitioning the physical layer is covered in Chapter 49, “Partitioning.”
- Designing data warehouses for business intelligence is covered in Chapter 51, “Business Intelligence Database Design.”
There's more to this chapter than the
standard “Intro to Normalization.” This chapter draws on the lessons
that have been learned over many years.
This chapter covers a book's worth of
material, but concisely summarizes the main ideas. The chapter opens
with an introduction to database design terms and concepts. Then the
same concept is presented from three perspectives: first, with the
common patterns, then with a custom Layered Design concept, and lastly
with the normal forms. Each of these ideas is easier to comprehend
after you understand the other two, so if you have the time, read the
chapter twice to get the most out of it.
As the number crunching began to be handled by
digital machines, human labor, rather than being eliminated, shifted to
other tasks. Analysts, programmers, managers, and IT staff have
replaced the human “computers” of days gone by.
1. Benefits of a Digital Database
The Information Age and the relational database brought several measurable benefits to organizations:
- Increased data consistency and better enforcement of business rules
- Improved sharing of data, especially across distances
- Improved ability to search for and retrieve information
- Improved generation of comprehensive reports
- Improved ability to analyze data trends
The general theme is that a computer database
originally didn't save time in the entry of data, but rather in the
retrieval of data and in the quality of the data retrieved. However,
with automated data collection in manufacturing, bar codes in
retailing, databases sharing more data, and consumers placing their own
orders on the Internet, the effort required to enter the data has also
decreased.
Note
This chapter presents the relational
database design principles and patterns used to develop operational, or
online transaction processing (OLTP), databases.
Some of the relational principles and
patterns may apply to other types of databases, but databases not used
for first-generation data (such as most BI, reporting databases, data
warehouses, or reference data stores) do not necessarily benefit from
normalization.
In this chapter, the term “database” exclusively refers to a relational, OLTP-style database.
2. Tables, Rows, Columns
A relational database collects related,
or common, data in a single list. For example, all the product
information may be listed in one table and all the customers in another
table.
A table appears similar to a spreadsheet and is
constructed of columns and rows. The appeal (and the curse) of the
spreadsheet is its informal development style, which makes it easy to
modify and add to as the design matures. Managers tend to store
critical information in spreadsheets, and many databases started as
informal spreadsheets.
In both a spreadsheet and a database table, each
row is an item in the list and each column is a specific piece of data
concerning that item, so each cell should contain a single piece of
data about a single item.
Whereas a spreadsheet tends to be free-flowing
and loose in its design, database tables should be consistent in terms
of the meaning of the data in a column. Because row and column
consistency is important to a database table, the design of the table
is critical.
Over the years, different development styles have referred to these concepts with various different terms, as listed in Table 1.
Table 1 Comparing Database Terms
SQL Server developers generally refer to database
elements as tables, rows, and columns when discussing the SQL Data
Definition Language (DDL) layer or physical schema and sometimes use
the terms entity, tuple, and attribute when discussing the logical
design. The rest of this book uses the SQL terms (table, row, and
column), but this chapter is devoted to the theory behind the design,
so the relational algebra terms (entity, tuple, and attribute) are also
used.
3. Database Design Phases
Traditionally, data modeling has been
split into two phases: the logical design and the physical design.
However, after spending countless hours designing relation databases
and listening to several lectures on database design the authors are
convinced that there are three phases to database design. To avoid
confusion with the traditional terms, they are defined as follows:
- Conceptual model: The first phase digests the organizational
requirements and identifies the entities, their attributes, and their
relationships. During this phase every opportunity should be taken to
collect any information that may have or has any relevance to the
project.
The conceptual diagram model is great for
understanding, communicating, and verifying the organization's
requirements. The diagramming method should be easily understood by all
the stakeholders — the subject-matter experts, development team, and
management. Visio or some similar diagramming tool can assist to
provide a visual aspect to the conceptual model.
At this layer, the design is
implementation-independent: It could end up on Oracle, SQL Server, or
even Access. Some designers refer to this as the “logical model.”
- SQL DDL Layer: This phase concentrates on performance
without losing the fidelity of the logical model as it applies the
design to a specific version of a database engine — SQL Server 2012,
for example, generating the DDL for the actual tables, keys, and
attributes. Typically, the SQL DDL Layer generalizes some entities and
replaces some natural keys with surrogate computer-generated keys.
Typically, database developers realize the
need for additional tables (entities) and their corresponding
attributes and keys. As a result, the SQL DDL layer might look
different than the conceptual model.
- Physical layer: The implementation phase considers how the
data will be physically stored on the disk subsystems using indexes,
partitioning, and materialized views. Changes made to this layer won't
affect how the data is accessed, only how it's stored on the disk.
The physical layer ranges from simple, for
small databases (under 20Gb), to complex, with multiple files and
filegroups, indexed views, and data routing partitions.
This chapter focuses on designing the conceptual
model, with a brief look at normalization followed by a repertoire of
database patterns.
Caution
Implementing a database without working
through the SQL DLL Layer design phase is a certain path to a poorly
performing database. Many database purists didn't care to learn SQL
Server implement conceptual designs only to blame SQL Server for the
horrible performance.