SQL Server 2012 : Database Basics (part 1) - Benefits of a Digital Database, Tables, Rows, Columns, Database Design Phases

11/8/2013 2:04:35 AM

The purpose of a database is to store the information required by an organization. Any means of collecting and organizing data is a database. Prior to the Information Age, information was primarily stored on cards, in file folders, or in ledger books. Before the adding machine, offices employed dozens of workers who spent all day adding columns of numbers and double-checking the math of others. The job title of those who had that exciting career was computer.

Author's Note

Welcome to the second of three chapters that deal with database design. Although they're spread out in the table of contents, they weave a consistent theme that good design yields great performance:

Chapter 2 provides an overview of data architecture.
Partitioning the physical layer is covered in Chapter 49, “Partitioning.”
Designing data warehouses for business intelligence is covered in Chapter 51, “Business Intelligence Database Design.”

There's more to this chapter than the standard “Intro to Normalization.” This chapter draws on the lessons that have been learned over many years.

This chapter covers a book's worth of material, but concisely summarizes the main ideas. The chapter opens with an introduction to database design terms and concepts. Then the same concept is presented from three perspectives: first, with the common patterns, then with a custom Layered Design concept, and lastly with the normal forms. Each of these ideas is easier to comprehend after you understand the other two, so if you have the time, read the chapter twice to get the most out of it.

As the number crunching began to be handled by digital machines, human labor, rather than being eliminated, shifted to other tasks. Analysts, programmers, managers, and IT staff have replaced the human “computers” of days gone by.

1. Benefits of a Digital Database

The Information Age and the relational database brought several measurable benefits to organizations:

Increased data consistency and better enforcement of business rules
Improved sharing of data, especially across distances
Improved ability to search for and retrieve information
Improved generation of comprehensive reports
Improved ability to analyze data trends

The general theme is that a computer database originally didn't save time in the entry of data, but rather in the retrieval of data and in the quality of the data retrieved. However, with automated data collection in manufacturing, bar codes in retailing, databases sharing more data, and consumers placing their own orders on the Internet, the effort required to enter the data has also decreased.

Note

This chapter presents the relational database design principles and patterns used to develop operational, or online transaction processing (OLTP), databases.

Some of the relational principles and patterns may apply to other types of databases, but databases not used for first-generation data (such as most BI, reporting databases, data warehouses, or reference data stores) do not necessarily benefit from normalization.

In this chapter, the term “database” exclusively refers to a relational, OLTP-style database.

2. Tables, Rows, Columns

A relational database collects related, or common, data in a single list. For example, all the product information may be listed in one table and all the customers in another table.

A table appears similar to a spreadsheet and is constructed of columns and rows. The appeal (and the curse) of the spreadsheet is its informal development style, which makes it easy to modify and add to as the design matures. Managers tend to store critical information in spreadsheets, and many databases started as informal spreadsheets.

In both a spreadsheet and a database table, each row is an item in the list and each column is a specific piece of data concerning that item, so each cell should contain a single piece of data about a single item.

Whereas a spreadsheet tends to be free-flowing and loose in its design, database tables should be consistent in terms of the meaning of the data in a column. Because row and column consistency is important to a database table, the design of the table is critical.

Over the years, different development styles have referred to these concepts with various different terms, as listed in Table 1.

Table 1 Comparing Database Terms

SQL Server developers generally refer to database elements as tables, rows, and columns when discussing the SQL Data Definition Language (DDL) layer or physical schema and sometimes use the terms entity, tuple, and attribute when discussing the logical design. The rest of this book uses the SQL terms (table, row, and column), but this chapter is devoted to the theory behind the design, so the relational algebra terms (entity, tuple, and attribute) are also used.

3. Database Design Phases

Traditionally, data modeling has been split into two phases: the logical design and the physical design. However, after spending countless hours designing relation databases and listening to several lectures on database design the authors are convinced that there are three phases to database design. To avoid confusion with the traditional terms, they are defined as follows:

Conceptual model: The first phase digests the organizational requirements and identifies the entities, their attributes, and their relationships. During this phase every opportunity should be taken to collect any information that may have or has any relevance to the project.

The conceptual diagram model is great for understanding, communicating, and verifying the organization's requirements. The diagramming method should be easily understood by all the stakeholders — the subject-matter experts, development team, and management. Visio or some similar diagramming tool can assist to provide a visual aspect to the conceptual model.

At this layer, the design is implementation-independent: It could end up on Oracle, SQL Server, or even Access. Some designers refer to this as the “logical model.”

SQL DDL Layer: This phase concentrates on performance without losing the fidelity of the logical model as it applies the design to a specific version of a database engine — SQL Server 2012, for example, generating the DDL for the actual tables, keys, and attributes. Typically, the SQL DDL Layer generalizes some entities and replaces some natural keys with surrogate computer-generated keys.

Typically, database developers realize the need for additional tables (entities) and their corresponding attributes and keys. As a result, the SQL DDL layer might look different than the conceptual model.

Physical layer: The implementation phase considers how the data will be physically stored on the disk subsystems using indexes, partitioning, and materialized views. Changes made to this layer won't affect how the data is accessed, only how it's stored on the disk.

The physical layer ranges from simple, for small databases (under 20Gb), to complex, with multiple files and filegroups, indexed views, and data routing partitions.

This chapter focuses on designing the conceptual model, with a brief look at normalization followed by a repertoire of database patterns.

Caution

Implementing a database without working through the SQL DLL Layer design phase is a certain path to a poorly performing database. Many database purists didn't care to learn SQL Server implement conceptual designs only to blame SQL Server for the horrible performance.

Others