SQL Server 2008 R2 : Defining Columns (part 1) - Data Types

11/26/2012 5:40:58 PM

A table is defined as a collection of columns. Each column represents an attribute of the database table and has characteristics that define its scope and the type of data it can contain. In defining a column, you must assign a name and a data type. For consistency and readability, the column names should adhere to a naming convention that you define for your environment. Naming conventions often use a set of standard suffixes that indicate the type of data the column will contain. For example, you can add the Date suffix to a column name (for example, OrderDate) to identify it as a column that contains date/time data, or you can add the suffix ID (for example, PrinterID) to indicate that the column contains a unique identifier.

When creating and naming columns, you need to keep the following restrictions in mind:

You can define up to 1,024 columns (nonsparse + computed) for each table. This number is increased to 30,000 columns if the table has a defined column set using sparse columns.
Column names must be unique within a table.
A row can hold a maximum of 8,060 bytes. Some data types can be stored off the 8KB data page to allow a row to exceed this limit.
A data type must be assigned to each column.

These restrictions provide a framework for a column definition. The next consideration in defining a column is the data type. The following section discusses the various data types.

Data Types

SQL Server 2008 has an extensive list of data types to choose from, including some that are new to SQL Server 2008. New data types include date, time, datetime2, datetimeoffset, filestream, and geometry. Each data type is geared toward a specific type of data that will be stored in the column. Table 1 provides a complete list of the data types available in SQL Server 2008.

Table 1. Table Data Types
Data Type	Range/Description	Storage
bigint	−2⁶³ (−9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)	8 bytes
binary (n)		Binary data with a length of n bytes The number of bytes defined by n, up to 8,000
bit	An integer data type that can take a value of 1, 0, or NULL	1 byte for every eight columns that are defined as bits on the table
char	Up to 8,000 characters	1 byte per character
date	0001-01-01 through 9999-12-31	3 bytes
datetime through	8 bytes; accurate to December 31, 9999	January 1, 1753, 3.33 milliseconds
datetime2 through	8 bytes; accurate to December 31, 9999	January 1, 0001, 100 nanoseconds
datetimeoffset through	10 bytes December 31, a time zone offset	January 1, 0001, 9999; includes
decimal	Based on the precision	−10³⁸⁺¹ to 10^38-1
float	−1.79E + 38 to −2.23E − 38, 0 and 2.23E − 38 to 1.79E + 38	4 or 8 bytes, depending on the allocation mantissa
geography representing	round-earth data such as GPS latitude and longitude coordinates	.NET CLR data type
geometry CLR data type representing	data in a Euclidean (flat) coordinate system	.NET
hierarchyid levels	Up to 892 bytes	User defined nodes and
image	Variable-length binary data	Up to 2^31-1 (2,147,483,647) bytes
int	−2³¹ (−2,147,483,648) to 2^31-1 (2,147,483,647)	4 bytes
money	−922,337,203,685,477.5808 to 922,337,203,685,477.5807	8 bytes
nchar	Up to 4,000 Unicode characters	Two times the number of characters entered
ntext	Up to 2^30-1 (1,073,741,823) characters	Two times the number of characters entered
numeric (p,s) 10^38-1	Based on the precision	−10³⁸⁺¹ through
nvarchar(n)		Up to 4,000 Unicode characters Two times the number of characters entered
nvarchar(max)	Unicode characters up to the maximum storage capacity of	Two times the number plus 2 bytes, up to 2^30-1
real	−1.18E − 38, 0 and 1.18E − 38 to 3.40E + 38	4 bytes
smalldatetime		January 1, 1900, through June 6, 2079 4 bytes; accurate to 1 minute
smallint to 2^15-1 (32,767)	2 bytes	−2¹⁵ (−32,768)
smallmoney 214,748.3647	4 bytes	−214,748.3648 to
sql_variant values of	Up to 8,016 bytes various SQL Server 2008–supported data types, except text, ntext, image, timestamp, and sql_variant	A data type that stores
text	Up to 231-1	2,147,483,647) characters Up to 2,147,483,647 bytes
time	00:00:00.0000000 to 23:59:59.9999999	5 bytes
timestamp/rowversion generated, unique binary	numbers within a database; generally used for version stamping rows	Automatically 8 bytes
tinyint	0 to 255	1 byte
uniqueidentifier unique identifier	16 bytes (GUID)	A 16-byte globally
varbinary(n)		Binary data with a length of n bytes The number of bytes defined by n, up to 8,000
varbinary(max)	Binary data up to the maximum storage capacity	Two times the number of characters entered plus 2 bytes, up to 2^30-1
varchar (n)	1 byte per character	Up to 8,000 characters
varchar (max)	Non-Unicode characters up to the maximum storage capacity	1 byte per character; maximum 2^31-1 bytes
xml	XML instances or a variable of XML type	2GB

The data type you select is important because it provides scope for the column. For example, if you define a column as type int, you can be assured that only integer data will be stored in the column and that character data will not be allowed. The advantages of data typing are fairly obvious but sometimes overlooked.

You should avoid defining most of your columns with a single data type, such as varchar. The visual tools provide a great way for you to select a data type: you simply select a data type from a drop-down selection box that lists the available data types.

Tip

The Object Explorer has a categorized list of all the system data types. To get to it, you open the Programmability node under your database and then expand the Types node. You then see a node named System Data Types that lists all the data type categories, including Exact Numbers, Approximate Numbers, and Date and Time. The data types for each category are listed under each category node. If you mouse over the particular data type, you see a brief description, including the valid range of values.

Several data types in SQL Server 2008 deserve special attention. Some of these data types are new to SQL Server 2008 and some of them were introduced in SQL Server 2005. The following sections discuss these data types.

New Date/Time Data Types

Several new date/time data types were added in SQL Server 2008. These data types were added to enhance SQL Server’s date/time capabilities. The date and time data types were added to separate these two date/time components. The date data type contains only the month, day, and year components, whereas the time data type contains only the time components. The separation of date and time was planned for SQL Server 2005 but never made it to the final release.

The precision and scale of date/time data types has been expanded in SQL Server 2008 as well. The datetime2 data type is similar to the datetime data type, but it has a larger range of dates (January 1, 0001, through December 31, 9999), and the time portion of this data type contains fractional seconds with seven digits of precision. The datetime data type is accurate only to within 3 milliseconds, whereas the new datetime2 data type is accurate to 100 nanoseconds.

Finally, SQL Server introduces time zone support in a new data type named datetimeoffset. This data type has precision in fractional seconds (like datetime2), but it also contains an extra date/time component that defines the time zone offset for the date. The time zone offset is two digits that represent the offset hours and two digits that represent the offset minutes. The offset is used against the UTC date. The following example shows how this new data type can be used:

select CAST('2009-07-08 11:33:22.1234567-04:00' AS datetimeoffset(7))

The xml Data Type

The xml data type (introduced in SQL Server 2005) enables you to store XML documents and XML fragments in a SQL Server database. (An XML fragment is an XML instance that is missing a single top-level element.)

The hierarchyid Data Type

The hierarchyid data type is new in SQL Server 2008. The hierarchyid data type is a variable-length system data type used to represent a position in a tree hierarchy. A column of type hierarchyid does not automatically represent a tree. It is up to the application to generate and assign hierarchyid values in such a way that the desired relationship between rows is reflected in the values.

Spatial Data Types

SQL Server 2008 introduces support for storing geographical data with the inclusion of new spatial data types. Spatial data types provide a comprehensive, high-performance, and extensible data storage solution for spatial data, enabling organizations of any scale to integrate geospatial features into their applications and services.

Spatial data types can be used to store and manipulate location-based information and come in the form of two new data types: geography and geometry. The geography data type is a .NET CLR data type that provides a storage structure for geodetic data, sometimes referred to as round earth data because it assumes a roughly spherical model of the world. It provides a storage structure for spatial data that is defined by latitude and longitude coordinates using an industry standard ellipsoid such as WGS84, the projection method used by Global Positioning System (GPS) applications. The geometry data type is a .NET CLR data type that supports the planar model/data, which assumes a flat projection and is therefore sometimes called flat earth. geometry data is represented as points, lines, and polygons on a flat surface, such as maps and interior floor plans where the curvature of the earth does not need to be taken into account.

Large-Value Data Types

Three large-value data types added in SQL Server 2005 allow you to store a significant amount of data in a single column. They allow you to store up to 2³¹ bytes of non-Unicode data and 2³⁰ bytes of Unicode data. All these data types have the (max) designator: varchar(max), nvarchar(max), and varbinary(max). The varchar, nvarchar, and varbinary data types were available prior to SQL Server 2005, but the max parameter gave these types additional scope.

The great thing about these data types is that they are much easier to work with than large object (LOB) data types. LOB data types (which include text, ntext, and image) require special programming when retrieving and storing data. The large-value data types do not have these restrictions. They can be used much like their smaller counterparts varchar(n), nvarchar(n), and varbinary(n) that are defined without the max keyword. So if you want to select data from a varchar(max) column, you can simply execute a SELECT statement against it, regardless of the amount of data stored in it. Consider, for example, the following SELECT statement, executed against a varchar(max) column named DocumentSummary in the AdventureWorks2008.Production.Document table:

select Title, substring(DocumentSummary,1,30) 'DocumentSummary'
from production.document
where LEFT(DocumentSummary,30) like 'Reflector%'

/* results from previous select statement
Title                                              DocumentSummary
-------------------------------------------------- ------------------------------
Front Reflector Bracket Installation               Reflectors are vital safety co
*/

This works fine with the varchar(max) column, but the LEFT function used in the WHERE clause would cause an error if the column were a text column instead.

The large-value data types can be stored in the data row or in a separate data page, based on the setting of the sp_tableoption 'large value types out of row' option. If the option is set to OFF, up to 8,000 characters can be stored in this column in the actual data row. If the option is set to ON, data for this column is stored in a separate data page if its length would result in the data row exceeding 8,060 bytes. The actual location of the column data is transparent to any user accessing the table.

Large Row Support

In SQL Server 2000, there was a strict limit of 8,060 bytes that could be stored in a single row. If the total amount of data exceeded this limit, the update or insert would fail. Enhancements were made in SQL Server 2005 to dynamically manage rows that exceed the 8,060-byte limit. This dynamic behavior is designed for columns that are defined as varchar, nvarchar, varbinary, or sql_variant. If the values in these columns cause the total size of the row to go beyond the 8,060-byte limit, SQL Server moves one or more of the variable-length columns to pages in the ROW_OVERFLOW_DATA allocation unit. A pointer to this separate storage location, rather than the actual data, is kept in the data row. If the data row shrinks below the 8,060-byte limit at a later time, SQL Server dynamically moves the data from the ROW_OVERFLOW_DATA allocation unit back into the data page.

The following example creates a table that has columns that could exceed the 8,060-byte limit, with a total of 9,000 characters:

CREATE TABLE t1
(col1 varchar(4000), col2 varchar(5000))

insert t1
select replicate('x', 4000),replicate('x', 5000)

If you execute the CREATE TABLE statement, you do not get any warning message related to the 8,060-byte limit. After the table is created, you can execute an insert into the table that exceeds the 8,060-byte limit. The insert succeeds, and the dynamic allocation previously described is handled automatically.

User-Defined Data Types

User-defined data types allow you to create custom data types that are based on the existing system data types. These data types are also called alias data types in SQL Server 2008. You create a user-defined data type and give it a unique name that you can then use in the definitions of tables. For example, you can create a user-defined data type named ShortDescription, defined as varchar(20), and assign it to any column. This promotes data type consistency across your tables.

You can create user-defined data types by using T-SQL in a couple of different ways. Using the sp_addtype system stored procedure and using the new CREATE TYPE command are two possibilities. The sp_addtype system stored procedure is slated to be removed in a future version of SQL Server, so using the CREATE TYPE command is preferred. The following example shows how to create the ShortDescription user-defined data type:

CREATE TYPE [dbo].[ShortDescription] FROM [varchar](20) NOT NULL

After a user-defined data type is created, you can use it in the definition of tables. The following is an example of a table created with the new ShortDescription user-defined data type:

CREATE TABLE [dbo].CodeTable
 (TableId int identity,
  TableDesc ShortDescription)

When you look at the definition of the CodeTable table in Object Explorer, you see the TableDesc column displayed with the ShortDescription data type as well as the underlying data type varchar(20).

You can use the Object Explorer to create user-defined data types as well. To do so, you right-click the User-Defined Data Types node, then select Programmability, and then select Types. Then you choose the New User-Defined Data Type option, and you can create a new user-defined data type through a friendly GUI screen. If you create a user-defined data type in the model database, this user-defined data type is created in any newly created database.

CLR User-Defined Types

SQL Server 2008 continues support for user-defined types (UDTs) implemented with the Microsoft .NET Framework common language runtime (CLR). CLR UDTs enable you to extend the type system of the database and also enable you to define complex structured types.

A UDT may be simple or structured and of any degree of complexity. A UDT can encapsulate complex, user-defined behaviors. You can use CLR UDTs in all contexts where you can use a system type in SQL Server, including in columns in tables, in variables in batches, in functions or stored procedures, as arguments of functions or stored procedures, or as return values from functions.

A UDT must first be implemented as a managed class or structure in any one of the CLR languages and compiled into a .NET Framework assembly. You can then register it with SQL Server by using the CREATE ASSEMBLY command, as in the following example:

CREATE ASSEMBLY latlong FROM 'c:\samplepath\latlong.dll'

After registering the assembly, you can create the CLR UDTs by using a variation of the CREATE TYPE command shown previously:

CREATE TYPE latitude EXTERNAL NAME latlong.latitude
CREATE TYPE longitude EXTERNAL NAME latlong.longitude

When a CLR UDT is created, you can use it in the definition of tables. The following example shows a table created with the new latitude and longitude UDTs:

CREATE TABLE [dbo].StoreLocation
 (StoreID int NOT NULL,
  StoreLatitude latitude,
  StoreLongitude longitude)

Others