Column Properties
Name
and data type are the most basic properties of a column, but many other
properties can be defined for a column. You do not have to specify
these properties to be able to create the columns, but you can use them
to further refine the type of data that can be stored within a column.
Note that many of the available column properties relate to indexes and
constraints that are beyond the scope of this section. The following
sections describe some of the column properties you are most likely to
encounter.
The NULL and NOT NULL Keywords
When you are defining
tables, it is always good idea to explicitly state whether a column
should or should not contain nulls. You do this by specifying the NULL or NOT NULL
keywords after the column data type. If the nullability option is not
specified, the SQL Server default is to allow nulls unless the ANSI_NULL_DFLT_OFF option is enabled for the session or no setting is specified for the session, and the ANSI_NULL_DEFAULT option for the database is set to OFF. Because of this uncertainty, it is best to always explicitly specify the desired nullability option for each column. Listing 1 creates a new table named PrinterCartridge that has the NULL or NOT NULL property specified for each column.
Listing 1. Defining Column NULL Properties by Using CREATE TABLE
CREATE TABLE dbo.PrinterCartridge
(
CartridgeId int NOT NULL,
PrinterID int NOT NULL,
CartridgeName varchar(50) NOT NULL,
CartridgeColor varchar(50) NOT NULL,
CartrideDescription varchar(255) NULL,
InstallDate datetime NOT NULL
)
GO
|
Note
It is beyond the scope of
this section to debate whether columns should ever allow nulls. In some
organizations, nulls are heavily used, and in others they are not
allowed. There is no right answer, but it is important for a development
team to be aware of the existence of nulls so that it can create
appropriate code to handle them.
Identity Columns
A property commonly specified when creating tables is IDENTITY.
This property automatically generates a unique sequential value when it
is assigned to a column. It can be assigned only to columns that are of
the following types:
decimal
int
numeric
smallint
bigint
tinyint
Only one identity column can exist for each table, and that column cannot allow nulls.
When implementing the IDENTITY property, you supply a seed and an increment. The seed is the starting value for the numeric count, and the increment is the amount by which it grows. A seed of 10 and an increment of 10 would produce values of 10, 20, 30, 40, and so on. If not specified, the default seed value is 1, and the increment is 1. Listing 2 adds an IDENTITY value to the PrinterCartridge table used in the previous example.
Listing 2. Defining an Identity Column by Using CREATE TABLE
IF EXISTS (SELECT * FROM dbo.sysobjects
WHERE id = OBJECT_ID(N'dbo.PrinterCartridge')
AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
DROP TABLE dbo.PrinterCartridge
CREATE TABLE dbo.PrinterCartridge
(
CartridgeId int IDENTITY (1000, 1) NOT NULL,
PrinterID int NOT NULL,
CartridgeName varchar(50) NOT NULL,
CartridgeColor varchar(50) NOT NULL,
CartrideDescription varchar(255) NULL,
InstallDate datetime NOT NULL
)
GO
insert PrinterCartridge
(PrinterID, CartridgeName, CartridgeColor, CartrideDescription, InstallDate)
values (1, 'inkjet', 'black','laser printer cartridge', '8/1/09')
select CartridgeId, PrinterID, CartridgeName
from PrinterCartridge
/* results from previous SELECT statement
CartridgeId PrinterID CartridgeName
----------- ----------- --------------------------------------------------
1000 1 inkjet
*/
|
In this listing, the seed value has been set to 1000, and the increment has been set to 1. An insert into the PrinterCartridge table and a subsequent SELECT from that table follows the CREATE TABLE statement in the listing. Notice that the results of the SELECT show a value of 1000 for the identity column CartridgeID. This is the seed or starting point that is defined.
ROWGUIDCOL Columns
An alternative to an identity column is a column defined with the ROWGUIDCOL property. Like the IDENTITY property, the ROWGUIDCOL property is autogenerating and unique. The difference is that the ROWGUIDCOL
option generates column values that will be unique on any networked
database anywhere in the world. The identity column generates values
that are unique only within the table that contains the column.
You can have only one ROWGUIDCOL column per table. You must create this ROWGUIDCOL column with the uniqueidentifier data type, and you must assign a default of NEWID()
to the column to generate the unique value. Keep in mind that users can
manually insert values directly into columns defined as ROWGUIDCOL. These manual inserts could cause duplicates in the column, so a UNIQUE constraint should be added to the column as well to ensure uniqueness.
Listing 3 shows the creation of a table with a ROWGUIDCOL column. Several rows are inserted into the newly created table, and those rows are selected at the end of the listing.
Listing 3. Defining a ROWGUIDCOL Column
CREATE TABLE SomeUniqueTable
(UniqueID UNIQUEIDENTIFIER DEFAULT NEWID(),
EffectiveDate datetime )
GO
INSERT INTO SomeUniqueTable (EffectiveDate) VALUES ('7/1/09')
INSERT INTO SomeUniqueTable (EffectiveDate) VALUES ('8/1/09')
GO
select * from SomeUniqueTable
/* Results from previous select statement
UniqueID EffectiveDate
------------------------------------ -----------------------
614181BC-D7B9-4108-B2BD-C2F39E999424 2009-07-01 00:00:00.000
62368A2D-3557-4727-9DD3-FBCA38705B1B 2009-08-01 00:00:00.000
*/
|
You can see that the ROWGUIDCOL values are fairly large. They are 16-byte binary
values that are significantly larger than most of the data types used
for identity columns. For example, an identity column defined as data
type int occupies only 4 bytes. You need to consider the storage requirements for ROWGUIDCOL when you select this data type.
Computed Columns
A computed column
is a column whose value is calculated based on other columns. Generally
speaking, the column is a virtual column because it is calculated on
the fly, and no value is stored in the database table. With SQL Server
2008, you have an option of actually storing the calculated value in the
database. You do so by marking the column as persisted. If the computed
column is persisted, you can create an index on this column as well.
Listing 4 includes several statements that relate to the creation of a computed column. It starts with an ALTER TABLE statement that adds a new computed column named SetRate to the Sales.CurrencyRate table in the AdventureWorks2008 database. The new rate column is based on an average of two other rate columns in the table. A SELECT statement is executed after that; it returns several columns, including the new SetRate computed column. The results are shown after the SELECT. Finally, an ALTER TABLE
statement is used to change the newly added column so that its values
are stored in the database. This is accomplished with the ADD PERSISTED option.
Listing 4. Defining a Computed Column
--Add a computed column to the Sales.CurrencyRate Table named SetRate
ALTER TABLE Sales.CurrencyRate
ADD SetRate AS ( (AverageRate + EndOfDayRate) / 2)
go
--Select several columns including the new computed column
select top 5 AverageRate, EndOfDayRate , SetRate
from sales.currencyrate
/*Results from previous SELECT statement
AverageRate EndOfDayRate SetRate
--------------------- --------------------- ---------------------
1.00 1.0002 1.0001
1.5491 1.55 1.5495
1.9379 1.9419 1.9399
1.4641 1.4683 1.4662
8.2781 8.2784 8.2782
*/
--Alter the computed SetRate column to be PERSISTED
ALTER TABLE Sales.CurrencyRate
alter column SetRate ADD PERSISTED
|
Note
You can use the sp_spaceused stored procedure to check the space allocated to the Sales.CurrencyRate
table. You need to check the size before the column is persisted, and
then you need to check the space allocated to the table after the column
is persisted. As you would expect, the space allocated to the table is
increased only after the column is persisted.
FILESTREAM Storage
SQL Server 2008 introduces FILESTREAM
storage for storing unstructured data, such as documents, images, and
videos. In previous versions of SQL Server, there were two ways of
storing unstructured data. One method was to store it in the database as
a binary large object (BLOB) in an image or varbinary(max)
column. The other method was to store the data outside the database,
separate from the structured relational data, storing a reference or
pathname to the unstructured data in a varchar column in a table. Neither of these methods is ideal for unstructured data.
FILESTREAM storage
helps to solve the issues with using unstructured data by integrating
the SQL Server Database Engine with the NTFS file system for storing the
unstructured data, such as documents and images, on the file system
with the database storing a pointer to the data. Although the actual
data resides outside the database in the NTFS file system, you can still
use T-SQL statements to insert, update, query, and back up FILESTREAM
data, while maintaining transactional consistency between the
unstructured data and corresponding structured data with same level of
security.
To specify that a column should store data on the file system when creating or altering a table, you specify the FILESTREAM attribute on a varbinary(max)
column. This causes the Database Engine to store all data for that
column on the file system, but not in the database file. After you
complete these tasks, you can use Transact-SQL and Win32 to manage the FILESTREAM data.
Note
To use FILESTREAM storage, you must first enable FILESTREAM storage at the Windows level as well as at the SQL Server Instance level. You can enable FILESTREAM
at the Windows level during installation of SQL Server 2008 or at any
time using SQL Server Configuration Manager. After you enable FILESTREAM at the Windows level, you next need to enable FILESTREAM
for the SQL Server Instance. You can do this either through SQL Server
Management Studio or via T-SQL.
Sparse Columns and Column Sets
SQL Server 2008 provides a new space-saving storage option referred to as sparse columns.
Sparse columns are ordinary columns that provide optimized storage for
null values. If the value of a column defined as a sparse column is NULL, it doesn’t consume any space at all. You can define a column as a sparse column by specifying the SPARSE keyword after the data type in the CREATE TABLE or ALTER TABLE statement, as shown in Listing 5.
Listing 5. Specifying a Sparse Column in a Create Table Statement
CREATE TABLE DBO.SPARSE_TABLE
(ID INT IDENTITY(1,1),
FIRST_NAME VARCHAR (50),
MIDDLE_NAME VARCHAR (50) SPARSE NULL,
LASTNAME VARCHAR (50)
)
|
The space savings of sparse
columns come with a trade-off, however, requiring extra space for
storing non-null values in the sparse column. Fixed-length and precision
data types require 4 extra bytes, and variable-length data types
require 2 extra bytes. For this reason, you should consider using sparse
columns only when the space saved is at least 20% to 40%.
SQL Server stores sparse
columns in a single XML column that appears to external applications and
end users as a normal column. Storing the sparse columns in a single
XML column allows up to 30,000 sparse columns in a single table,
exceeding the limitation of 1,024 columns if sparse columns are not
used. In addition, because sparse columns have many null-valued rows,
they are good candidates for filtered indexes. A filtered index on a
sparse column can index only the rows that have non-null values stored
in the column. This creates smaller and more efficient indexes.
Sparse columns can be of any SQL Server data type and behave like any other column with the following restrictions:
A sparse column must be nullable and cannot have the ROWGUIDCOL or IDENTITY properties. A sparse column cannot be of the following data types—text, ntext, image, timestamp, user-defined data type, geometry, or geography—or have the FILESTREAM attribute.
A sparse column cannot have a default value.
A sparse column cannot be bound to a rule.
A computed column cannot be marked as sparse.
A sparse column cannot be part of a clustered index or a unique primary key index.
When the number of sparse
columns in a table is large, and operating on them individually is
cumbersome, you may want to define a column set. A column set is an
untyped XML representation that combines all the sparse columns of a
table into a structured set. A column set is like a calculated column in
that the column set is not physically stored in the table, but the
column set is directly updatable. Applications may see some performance improvement when they select and insert data by using column sets on tables that have lots of columns.
To define a column set, use the <column_set_name> FOR ALL_SPARSE_COLUMNS keywords in the CREATE TABLE or ALTER TABLE statements, as shown in Listing 6.
Listing 6. Defining a Column Set
CREATE TABLE emp_info
(ID INT IDENTITY(1,1),
FIRST_NAME VARCHAR (50),
MIDDLE_NAME VARCHAR (50) SPARSE NULL ,
LASTNAME VARCHAR (50),
HOMEPHONE VARCHAR(10) SPARSE NULL,
BUSPHONE VARCHAR(10) SPARSE NULL,
CELLPHONE VARCHAR(10) SPARSE NULL,
FAX VARCHAR(10) SPARSE NULL,
EMAIL VARCHAR(30) SPARSE NULL,
WEBSITE VARCHAR(30) SPARSE NULL,
CSet XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
)
|
A column set is created as
an untyped XML column and is treated as any other XML column with a
maximum XML data size limit of 2GB. Only one column set per table is
allowed. You cannot add a column set to a table if the table already
contains sparse columns.
To specify a column as a sparse column using SQL Server Management Studio (SSMS), set the Is Sparse property to Yes in the column properties for the selected column (see Figure 1). Similarly, if a column needs to be declared as column set, set the Is Columnset property to Yes in the column properties.