SQL Server 2008 R2 : Defining Columns (part 2) - Column Properties

11/26/2012 5:42:49 PM

Column Properties

Name and data type are the most basic properties of a column, but many other properties can be defined for a column. You do not have to specify these properties to be able to create the columns, but you can use them to further refine the type of data that can be stored within a column. Note that many of the available column properties relate to indexes and constraints that are beyond the scope of this section. The following sections describe some of the column properties you are most likely to encounter.

The NULL and NOT NULL Keywords

When you are defining tables, it is always good idea to explicitly state whether a column should or should not contain nulls. You do this by specifying the NULL or NOT NULL keywords after the column data type. If the nullability option is not specified, the SQL Server default is to allow nulls unless the ANSI_NULL_DFLT_OFF option is enabled for the session or no setting is specified for the session, and the ANSI_NULL_DEFAULT option for the database is set to OFF. Because of this uncertainty, it is best to always explicitly specify the desired nullability option for each column. Listing 1 creates a new table named PrinterCartridge that has the NULL or NOT NULL property specified for each column.

Listing 1. Defining Column NULL Properties by Using CREATE TABLE

CREATE TABLE dbo.PrinterCartridge
    (
    CartridgeId int NOT NULL,
    PrinterID int NOT NULL,
    CartridgeName varchar(50) NOT NULL,
    CartridgeColor varchar(50) NOT NULL,
    CartrideDescription varchar(255) NULL,
    InstallDate datetime NOT NULL
    )
GO

Note

It is beyond the scope of this section to debate whether columns should ever allow nulls. In some organizations, nulls are heavily used, and in others they are not allowed. There is no right answer, but it is important for a development team to be aware of the existence of nulls so that it can create appropriate code to handle them.

Identity Columns

A property commonly specified when creating tables is IDENTITY. This property automatically generates a unique sequential value when it is assigned to a column. It can be assigned only to columns that are of the following types:

decimal
int
numeric
smallint
bigint
tinyint

Only one identity column can exist for each table, and that column cannot allow nulls.

When implementing the IDENTITY property, you supply a seed and an increment. The seed is the starting value for the numeric count, and the increment is the amount by which it grows. A seed of 10 and an increment of 10 would produce values of 10, 20, 30, 40, and so on. If not specified, the default seed value is 1, and the increment is 1. Listing 2 adds an IDENTITY value to the PrinterCartridge table used in the previous example.

Listing 2. Defining an Identity Column by Using CREATE TABLE

IF  EXISTS (SELECT * FROM dbo.sysobjects
WHERE id = OBJECT_ID(N'dbo.PrinterCartridge')
AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
DROP TABLE dbo.PrinterCartridge

CREATE TABLE dbo.PrinterCartridge
    (
    CartridgeId int IDENTITY (1000, 1) NOT NULL,
    PrinterID int NOT NULL,
    CartridgeName varchar(50) NOT NULL,
    CartridgeColor varchar(50) NOT NULL,
    CartrideDescription varchar(255) NULL,
    InstallDate datetime NOT NULL
    )
GO

insert PrinterCartridge
 (PrinterID, CartridgeName, CartridgeColor, CartrideDescription, InstallDate)
values (1, 'inkjet', 'black','laser printer cartridge', '8/1/09')

select CartridgeId, PrinterID, CartridgeName
 from PrinterCartridge

/* results from previous SELECT statement
CartridgeId PrinterID   CartridgeName
----------- ----------- --------------------------------------------------
1000        1           inkjet
*/

In this listing, the seed value has been set to 1000, and the increment has been set to 1. An insert into the PrinterCartridge table and a subsequent SELECT from that table follows the CREATE TABLE statement in the listing. Notice that the results of the SELECT show a value of 1000 for the identity column CartridgeID. This is the seed or starting point that is defined.

ROWGUIDCOL Columns

An alternative to an identity column is a column defined with the ROWGUIDCOL property. Like the IDENTITY property, the ROWGUIDCOL property is autogenerating and unique. The difference is that the ROWGUIDCOL option generates column values that will be unique on any networked database anywhere in the world. The identity column generates values that are unique only within the table that contains the column.

You can have only one ROWGUIDCOL column per table. You must create this ROWGUIDCOL column with the uniqueidentifier data type, and you must assign a default of NEWID() to the column to generate the unique value. Keep in mind that users can manually insert values directly into columns defined as ROWGUIDCOL. These manual inserts could cause duplicates in the column, so a UNIQUE constraint should be added to the column as well to ensure uniqueness.

Listing 3 shows the creation of a table with a ROWGUIDCOL column. Several rows are inserted into the newly created table, and those rows are selected at the end of the listing.

Listing 3. Defining a ROWGUIDCOL Column

CREATE TABLE SomeUniqueTable
   (UniqueID   UNIQUEIDENTIFIER      DEFAULT NEWID(),
   EffectiveDate datetime )
GO
INSERT INTO SomeUniqueTable (EffectiveDate) VALUES ('7/1/09')
INSERT INTO SomeUniqueTable (EffectiveDate) VALUES ('8/1/09')
GO
select * from SomeUniqueTable
/* Results from previous select statement
UniqueID                             EffectiveDate
------------------------------------ -----------------------
614181BC-D7B9-4108-B2BD-C2F39E999424 2009-07-01 00:00:00.000
62368A2D-3557-4727-9DD3-FBCA38705B1B 2009-08-01 00:00:00.000
*/

You can see that the ROWGUIDCOL values are fairly large. They are 16-byte binary values that are significantly larger than most of the data types used for identity columns. For example, an identity column defined as data type int occupies only 4 bytes. You need to consider the storage requirements for ROWGUIDCOL when you select this data type.

Computed Columns

A computed column is a column whose value is calculated based on other columns. Generally speaking, the column is a virtual column because it is calculated on the fly, and no value is stored in the database table. With SQL Server 2008, you have an option of actually storing the calculated value in the database. You do so by marking the column as persisted. If the computed column is persisted, you can create an index on this column as well.

Listing 4 includes several statements that relate to the creation of a computed column. It starts with an ALTER TABLE statement that adds a new computed column named SetRate to the Sales.CurrencyRate table in the AdventureWorks2008 database. The new rate column is based on an average of two other rate columns in the table. A SELECT statement is executed after that; it returns several columns, including the new SetRate computed column. The results are shown after the SELECT. Finally, an ALTER TABLE statement is used to change the newly added column so that its values are stored in the database. This is accomplished with the ADD PERSISTED option.

Listing 4. Defining a Computed Column

--Add a computed column to the Sales.CurrencyRate Table named SetRate
ALTER TABLE Sales.CurrencyRate
 ADD SetRate AS ( (AverageRate + EndOfDayRate) / 2)
go
--Select several columns including the new computed column
select top 5 AverageRate, EndOfDayRate , SetRate
 from sales.currencyrate

/*Results from previous SELECT statement
AverageRate           EndOfDayRate          SetRate
--------------------- --------------------- ---------------------
1.00                  1.0002                1.0001
1.5491                1.55                  1.5495
1.9379                1.9419                1.9399
1.4641                1.4683                1.4662
8.2781                8.2784                8.2782
*/

--Alter the computed SetRate column to be PERSISTED
ALTER TABLE Sales.CurrencyRate
 alter column SetRate ADD PERSISTED

Note

You can use the sp_spaceused stored procedure to check the space allocated to the Sales.CurrencyRate table. You need to check the size before the column is persisted, and then you need to check the space allocated to the table after the column is persisted. As you would expect, the space allocated to the table is increased only after the column is persisted.

FILESTREAM Storage

SQL Server 2008 introduces FILESTREAM storage for storing unstructured data, such as documents, images, and videos. In previous versions of SQL Server, there were two ways of storing unstructured data. One method was to store it in the database as a binary large object (BLOB) in an image or varbinary(max) column. The other method was to store the data outside the database, separate from the structured relational data, storing a reference or pathname to the unstructured data in a varchar column in a table. Neither of these methods is ideal for unstructured data.

FILESTREAM storage helps to solve the issues with using unstructured data by integrating the SQL Server Database Engine with the NTFS file system for storing the unstructured data, such as documents and images, on the file system with the database storing a pointer to the data. Although the actual data resides outside the database in the NTFS file system, you can still use T-SQL statements to insert, update, query, and back up FILESTREAM data, while maintaining transactional consistency between the unstructured data and corresponding structured data with same level of security.

To specify that a column should store data on the file system when creating or altering a table, you specify the FILESTREAM attribute on a varbinary(max) column. This causes the Database Engine to store all data for that column on the file system, but not in the database file. After you complete these tasks, you can use Transact-SQL and Win32 to manage the FILESTREAM data.

Note

To use FILESTREAM storage, you must first enable FILESTREAM storage at the Windows level as well as at the SQL Server Instance level. You can enable FILESTREAM at the Windows level during installation of SQL Server 2008 or at any time using SQL Server Configuration Manager. After you enable FILESTREAM at the Windows level, you next need to enable FILESTREAM for the SQL Server Instance. You can do this either through SQL Server Management Studio or via T-SQL.

Sparse Columns and Column Sets

SQL Server 2008 provides a new space-saving storage option referred to as sparse columns. Sparse columns are ordinary columns that provide optimized storage for null values. If the value of a column defined as a sparse column is NULL, it doesn’t consume any space at all. You can define a column as a sparse column by specifying the SPARSE keyword after the data type in the CREATE TABLE or ALTER TABLE statement, as shown in Listing 5.

Listing 5. Specifying a Sparse Column in a Create Table Statement

CREATE TABLE DBO.SPARSE_TABLE
(ID INT IDENTITY(1,1),
 FIRST_NAME VARCHAR (50),
 MIDDLE_NAME VARCHAR (50) SPARSE NULL,
 LASTNAME VARCHAR (50)
)

The space savings of sparse columns come with a trade-off, however, requiring extra space for storing non-null values in the sparse column. Fixed-length and precision data types require 4 extra bytes, and variable-length data types require 2 extra bytes. For this reason, you should consider using sparse columns only when the space saved is at least 20% to 40%.

SQL Server stores sparse columns in a single XML column that appears to external applications and end users as a normal column. Storing the sparse columns in a single XML column allows up to 30,000 sparse columns in a single table, exceeding the limitation of 1,024 columns if sparse columns are not used. In addition, because sparse columns have many null-valued rows, they are good candidates for filtered indexes. A filtered index on a sparse column can index only the rows that have non-null values stored in the column. This creates smaller and more efficient indexes.

Sparse columns can be of any SQL Server data type and behave like any other column with the following restrictions:

A sparse column must be nullable and cannot have the ROWGUIDCOL or IDENTITY properties. A sparse column cannot be of the following data types—text, ntext, image, timestamp, user-defined data type, geometry, or geography—or have the FILESTREAM attribute.
A sparse column cannot have a default value.
A sparse column cannot be bound to a rule.
A computed column cannot be marked as sparse.
A sparse column cannot be part of a clustered index or a unique primary key index.

When the number of sparse columns in a table is large, and operating on them individually is cumbersome, you may want to define a column set. A column set is an untyped XML representation that combines all the sparse columns of a table into a structured set. A column set is like a calculated column in that the column set is not physically stored in the table, but the column set is directly updatable. Applications may see some performance improvement when they select and insert data by using column sets on tables that have lots of columns.

To define a column set, use the <column_set_name> FOR ALL_SPARSE_COLUMNS keywords in the CREATE TABLE or ALTER TABLE statements, as shown in Listing 6.

Listing 6. Defining a Column Set

CREATE TABLE emp_info
(ID INT IDENTITY(1,1),
 FIRST_NAME VARCHAR (50),
 MIDDLE_NAME VARCHAR (50) SPARSE NULL ,
 LASTNAME VARCHAR (50),
 HOMEPHONE VARCHAR(10) SPARSE NULL,
 BUSPHONE VARCHAR(10) SPARSE NULL,
 CELLPHONE VARCHAR(10) SPARSE NULL,
 FAX VARCHAR(10) SPARSE NULL,
 EMAIL VARCHAR(30) SPARSE NULL,
 WEBSITE VARCHAR(30) SPARSE NULL,
 CSet XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
)

A column set is created as an untyped XML column and is treated as any other XML column with a maximum XML data size limit of 2GB. Only one column set per table is allowed. You cannot add a column set to a table if the table already contains sparse columns.

To specify a column as a sparse column using SQL Server Management Studio (SSMS), set the Is Sparse property to Yes in the column properties for the selected column (see Figure 1 ). Similarly, if a column needs to be declared as column set, set the Is Columnset property to Yes in the column properties.

Figure 1. Setting a column as a sparse column.

Others