SQL Server 2008 R2 : Basic Tenets of Designing for Performance, Logical Database Design Issues

8/24/2013 9:36:44 AM

1. Basic Tenets of Designing for Performance

Designing for performance requires making trade-offs. For example, to get the best write performance out of a database, you must sacrifice read performance. Before you tackle database design issues for an application, it is critical to understand your goals. Do you want faster read performance? Faster write performance? A more understandable design?

Following are some basic truths about physical database design for SQL Server 2008 and the performance implications of each:

It’s important to keep table row sizes as small as possible. Doing so is not about saving disk space. Having smaller rows means more rows fit on a single 8KB page, which means fewer physical disk reads are required to read a given number of rows.
You should use indexes to speed up read access. However, the more indexes a table has, the longer it takes to insert, update, and delete rows from the table.
Using triggers to perform any kind of work during an insert, an update, or a delete exacts a performance toll and decreases concurrency by lengthening transaction duration.
Implementing declarative referential integrity (via primary and foreign keys) helps maintain data integrity, but enforcing foreign key constraints requires extra lookups on the primary key table to ensure existence.
Using ON DELETE CASCADE referential integrity constraints helps maintain data integrity but requires extra work on the server’s part.

Keeping tables as narrow as possible—that is, ensuring that the row size is as small as possible—is one of the most important things you can do to ensure that a database performs well. To keep your tables narrow, you should choose column data types with size in mind. You shouldn’t use the bigint data type if the int will do. If you have zero-to-one relationships in tables, you should consider vertically partitioning the tables.

Cascading deletes (and updates) cause extra lookups to be done whenever a delete runs against the parent table. In many cases, the optimizer uses worktables to resolve delete and update queries. Enforcing these constraints manually, from within stored procedures, for example, can give better performance. This is not a wholehearted endorsement against referential integrity constraints. In most cases, the extra performance hit is worth the saved aggravation of coding everything by hand. However, you should be aware of the cost of this convenience.

2. Logical Database Design Issues

A good database design is fundamental to the success of any application. Logical database design for relational databases follows rules of normalization. As a result of normalization, you create a data model that is usually, but not necessarily, translated into a physical data model. A logical database design does not depend on the relational database you intend to use. The same data model can be applied to Oracle, Sybase, SQL Server, or any other relational database. On the other hand, a physical data model makes extensive use of the features of the underlying database engine to yield optimal performance for the application. Physical models are much less portable than logical models.

Tip

If portability is a big concern to you, consider using a third-party data modeling tool, such as ERwin or ERStudio. These tools have features that make it easier to migrate your logical data models to physical data models on different database platforms. Of course, using these tools just gets you started; to get the best performance out of your design, you need to tweak the physical design for the platform you have chosen.

Normalization Conditions

Any database designer must address two fundamental issues:

Designing the database in a simple, understandable way that is maintainable and makes sense to its developers and users
Designing the database such that data is fetched and saved with the fastest response time, resulting in high performance

Normalization is a technique used on relational databases to organize data across many tables so that related data is kept together based on certain guidelines. Normalization results in controlled redundancy of data; therefore, it provides a good balance between disk space usage and performance. Normalization helps people understand the relationships between data and enforces rules to ensure that the data is meaningful.

Tip

Normalization rules exist, among other reasons, to make it easier for people to understand the relationships between data. But a perfectly normalized database sometimes doesn’t perform well under certain circumstances, and it may be difficult to understand. There are good reasons to deviate from a perfectly normalized database.

Normalization Forms

Five normalization forms exist, represented by the symbol 1NF for first normal form, 2NF for second normal form, and so on. If you follow the rules for the first rule of normalization, your database can be described as “in first normal form.”

Each rule of normalization depends on the previous rule for successful implementation, so to be in second normal form (2NF), your database must also follow the rules for first normal form.

A typical relational database used in a business environment falls somewhere between second and third normal forms. It is rare to progress past the third normal form because fourth and fifth normal forms are more academic than practical in real-world environments.

Following is a brief description of the first three rules of normalization.

First Normal Form

The first rule of normalization requires removing repeating data values and specifies that no two rows in a table can be identical. This means that each table must have a logical primary key that uniquely identifies a row in the table.

Consider a table that has four columns—PublisherName, Title1, Title2, and Title3—for storing up to three titles for each publisher. This table is not in first normal form due to the repeating Title columns. The main problem with this design is that it limits the number of titles associated with a publisher to three.

Removing the repeating columns so there is just a PublisherName column and a single Title column puts the table in first normal form. A separate data row is stored in the table for each title published by each publisher. The combination of PublisherName and Title becomes the primary key that uniquely identifies each row and prevents duplicates.

Second Normal Form

A table is considered to be in second normal form if it conforms to the first normal form and all nonkey attributes of the table are fully dependent on the entire primary key. If the primary key consists of multiple columns, nonkey columns should depend on the entire key and not just on a part of the key. A table with a single column as the primary key is automatically in second normal form if it satisfies first normal form as well.

Assume that you need to add the publisher address to the database. Adding it to the table with the PublisherName and Title column would violate second normal form. The primary key consists of both PublisherName and Title, but the PublisherAddress attribute is an attribute of the publisher only. It does not depend on the entire primary key.

To put the database in second normal form requires adding an additional table for storing publisher information. One table consists of the PublisherName column and PublisherAddress. The second table contains the PublisherName and Title columns. To retrieve the PublisherName, Title, and PublisherAddress information in a single result would require a join between the two tables on the PublisherName column.

Third Normal Form

A table is considered to be in third normal form if it already conforms to the first two normal forms and if none of the nonkey columns are dependent on any other nonkey columns. All such attributes should be removed from the table.

Let’s look at an example that comes up often during database architecture. Suppose that an employee table has four columns: EmployeeID (the primary key), salary, bonus, and total_salary, where total_salary = salary + bonus. Existence of the total_salary column in the table violates the third normal form because a nonkey column (total_salary) is dependent on two other nonkey columns (salary and bonus). Therefore, for the table to conform to the third rule of normalization, you must remove the total_salary column from the employee table.

Benefits of Normalization

Following are the major advantages of normalization:

Because information is logically kept together, normalization provides improved overall understanding of the system.
Because of controlled redundancy of data, normalization can result in fast table scans and searches (because less physical data has to be processed).
Because tables are smaller with normalization, index creation and data sorts are much faster.
With less redundant data, it is easier to maintain referential integrity for the system.
Normalization results in narrower tables. Because you can store more rows per page, more rows can be read and cached for each I/O performed on the table. This results in better I/O performance.

Drawbacks of Normalization

One result of normalization is that data is stored in multiple tables. To retrieve or modify information, you usually have to establish joins across multiple tables. Joins are expensive from an I/O standpoint. Multitable joins can have an adverse impact on the performance of the system. The following sections discuss some of the denormalization techniques you can use to improve the performance of a system.

Tip

An adage for normalization is “Normalize ’til it hurts; denormalize ’til it works.” To put this maxim into use, try to put your database in third normal form initially. Then, when you’re ready to implement the physical structure, drop back from third normal form, where excessive table joins are hurting performance. A common mistake is that developers make too many assumptions and over-denormalize the database design before even a single line of code has been written to even begin to assess the database performance.

Others

- SQL Server 2008 R2 : Database Snapshots - Setting Up Snapshots Against a Database Mirror

- SQL Server 2008 R2 : Query Versus Update Performance , Identifying Missing Indexes

- Windows Server 2008 : Creating Basic Visual Basic Scripts - Using if Statements, Checking for a Value with a Message Box

- Windows Server 2008 : Creating Basic Visual Basic Scripts - Displaying a Message Box with a Visual Basic Script

- Windows Server 2008 : Creating Basic Visual Basic Scripts - Working with filesystemobject

- Windows 7 : Installing and Upgrading Programs - Common Installation Prompts (part 2) - Type of Installation

- Windows 7 : Installing and Upgrading Programs - Common Installation Prompts (part 1) - Compliance check , The End User License Agreement

- Windows 7 : Installing and Upgrading Programs - Installing and Upgrading from a Disk

- Windows Server 2012 : Using Capacity-Analysis Tools - Other Microsoft Assessment and Planning Tools, Third-Party Toolset