Working with XML can be broken into three main categories:
- Generating XML
- Querying XML
- Validating XML
Prior to SQL Server 2008, it was the
responsibility of the application layer to produce the XML, and there
was no consistent or defined way to produce the XML. Typically, it
would use the XML API functions available in the programming languages
to produce the desired XML, and if you have done it, it wasn't easy.
Luckily, XML started gaining acceptance, and
developers saw more and more need to produce and consume XML.
Developers started to see the benefit of XML, and today you see it all
over in websites that produce RSS or ATOM feeds. And you can't forget
XML and (WCF, Windows Communication Foundation) web services, which
generate XML documents containing information to be exchanged.
SQL Server 2000 was a great first step into supporting XML with support for the T-SQL FOR XML clause. The FOR XML
clause transforms the results of a T-SQL query into an XML stream. This
was a huge benefit for developers who no longer needed to build XML
documents in their applications
Querying XML wasn't a walk in the park either, in
the beginning. The same XML API's that were used to produce XML were
used to query the XML. This caused a lot of overhead in any application
that produced or consumed XML. Something better was needed, and the
answer came with SQL Server 2000 in the form of the OPENXML function. The OPENXML
function requires a three-step process, using a couple of system stored
procedures to prepare an XML document handle and then another to
release the handle, while calling OPENXML in between to obtain the result set.
Because of the way OPENXML
was implemented (a function call between two system stored procedure
calls) it made it difficult in some circumstances to implement. For
example, you could not use it in set-based operations, and this is what
SQL Server excels at!
Luckily, SQL Server 2005 came to the rescue with
the XML data type which enables you to store complete XML documents or
XML fragments. Included with the XML data type was support for XQuery,
a language specifically designed to query XML documents. This
functionality alone makes the OPENXML
clause nearly obsolete because using XQuery is more lightweight, more
powerful, and much easier to use. It also does not have the limitations
of the OPENXML function.
However, even though later versions of SQL Server
came with better support for producing and querying XML, protecting and
ensuring the validity of the XML can't be left behind. Any production
application should include a robust validation process to information
being exchanged, and even more so when exchanging XML data simply
because the chances of invalid values are much greater.
For example, an application passing the value
“thirty” to the @age parameter of a stored procedure (@age INT) would
receive a conversion error immediately as SQL Server would perform an
implicit data type validation.
XML, however, is different. SQL Server can't
detect an error in an XML document. For example, given the element
“<Employee age=”too old to code” />”, the @age attribute is not
associated with a data type, and SQL Server simply does not know how to
validate it.
The solution is the support for schemas, included
with SQL Server 2005. Schema Definition Language (XSD) is a language
specifically used to describe and validate XML documents. The
validation is based on structure and format rules, providing the
ability to validate an XML document against the schema.
Starting with SQL Server 2005, SQL Server
supports XML Schemas via XML Schema Collection objects, and you learn
more about schemas and schema collections shortly. The great thing is
that you can apply schemas to an XML data type column, a variable, and
a parameter. By applying schemas, you can provide a more stringent
validation of XML to help many of the validation scenarios you might
find when dealing with non-XML data, such as the following:
- Elements in your XML need to follow a certain order (FirstName must proceed LastName).
- Dealing with optional or mandatory elements.
- Validation of data types (for example, age is an integer).
- Enforcing specific formats of data (for example, Social Security numbers formatted as 999-99-9999).
- Ensuring elements appear only once.
The ability to apply a schema to an XML document
is called “typing” your XML. You learn about typed versus untyped XML
shortly. But enough blabbering. Let's start working with some data.
Open Microsoft SQL Server Management Studio and create a
new database. Open a new query window and execute the following code
against your new database.
IF EXISTS (SELECT * FROM sys.objects WHERE object_id=
OBJECT_ID(N'[dbo].[Customer]') AND type in (N'U'))
DROP TABLE [dbo].[Customer]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id=
OBJECT_ID(N'[dbo].[Item]') AND type in (N'U'))
DROP TABLE [dbo].[Item]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id=
OBJECT_ID(N'[dbo].[Orders]') AND type in (N'U'))
DROP TABLE [dbo].[Orders]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id=
OBJECT_ID(N'[dbo].[OrderDetail]') AND type in (N'U'))
DROP TABLE [dbo].[OrderDetail]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id=
OBJECT_ID(N'[dbo].[ItemInfo]') AND type in (N'U'))
DROP TABLE [dbo].[ItemInfo]
GO
/****** Object: Table [dbo].[Customer] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Customer](
[CustomerID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NULL,
[Address] [nvarchar](50) NULL,
[City] [nvarchar](50) NULL,
[State] [nvarchar](50) NULL,
[ZipCode] [nvarchar](50) NULL,
[Phone] [nvarchar](50) NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED
(
[CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Customer] ON
INSERT [dbo].[Customer] ([CustomerID], [Name], [Address], [City], [State],
[ZipCode], [Phone])
VALUES (1, N'Scott', N'555 Main St.', N'Palm Beach', N'FL', N'33333', N'555-555-5555')
INSERT [dbo].[Customer] ([CustomerID], [Name], [Address], [City], [State],
[ZipCode], [Phone])
VALUES (2, N'Adam', N'111 Works St.', N'Jax', N'FL', N'34343', N'444-444-4444')
INSERT [dbo].[Customer] ([CustomerID], [Name], [Address], [City], [State],
[ZipCode], [Phone])
VALUES (3, N'John', N'123 Pike Blvd', N'Seattle', N'WA', N'98989', N'999-999-9999')
SET IDENTITY_INSERT [dbo].[Customer] OFF
/****** Object: Table [dbo].[Item] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Item](
[ItemID] [int] IDENTITY(1,1) NOT NULL,
[ItemNumber] [nvarchar](50) NULL,
[ItemDescription] [nvarchar](50) NULL,
CONSTRAINT [PK_Item] PRIMARY KEY CLUSTERED
(
[ItemID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Item] ON
INSERT [dbo].[Item] ([ItemID], [ItemNumber], [ItemDescription])
VALUES (1, N'V001', N'Verizon Windows Phone 7')
INSERT [dbo].[Item] ([ItemID], [ItemNumber], [ItemDescription])
VALUES (2, N'A017', N'Alienware MX 17')
INSERT [dbo].[Item] ([ItemID], [ItemNumber], [ItemDescription])
VALUES (3, N'P002', N'Peters Pea Shooter 3000')
SET IDENTITY_INSERT [dbo].[Item] OFF
/****** Object: Table [dbo].[Orders] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Orders](
[OrderID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[OrderNumber] [nvarchar](50) NULL,
[OrderDate] [datetime] NULL,
CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED
(
[OrderID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Orders] ON
INSERT [dbo].[Orders] ([OrderID], [CustomerID], [OrderNumber], [OrderDate])
VALUES (1, 1, N'10001', ‘6/15/2011')
INSERT [dbo].[Orders] ([OrderID], [CustomerID], [OrderNumber], [OrderDate])
VALUES (2, 2, N'10002', ‘6/16/2011')
INSERT [dbo].[Orders] ([OrderID], [CustomerID], [OrderNumber], [OrderDate])
VALUES (3, 1, N'10003', ‘6/17/2011')
INSERT [dbo].[Orders] ([OrderID], [CustomerID], [OrderNumber], [OrderDate])
VALUES (4, 2, N'10004', ‘6/18/2011')
SET IDENTITY_INSERT [dbo].[Orders] OFF
/****** Object: Table [dbo].[OrderDetail] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[OrderDetail](
[OrderDetailID] [int] IDENTITY(1,1) NOT NULL,
[OrderID] [int] NULL,
[ItemID] [int] NULL,
[Quantity] [int] NULL,
[Price] [money] NULL,
CONSTRAINT [PK_OrderDetail] PRIMARY KEY CLUSTERED
(
[OrderDetailID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[OrderDetail] ON
INSERT [dbo].[OrderDetail] ([OrderDetailID], [OrderID], [ItemID], [Quantity], [Price])
VALUES (1, 1, 1, 1, 299.9900)
INSERT [dbo].[OrderDetail] ([OrderDetailID], [OrderID], [ItemID], [Quantity], [Price])
VALUES (2, 2, 2, 1, 2999.9900)
INSERT [dbo].[OrderDetail] ([OrderDetailID], [OrderID], [ItemID], [Quantity], [Price])
VALUES (3, 1, 1, 5, 1499.9500)
INSERT [dbo].[OrderDetail] ([OrderDetailID], [OrderID], [ItemID], [Quantity], [Price])
VALUES (4, 2, 3, 2, 3.9900)
SET IDENTITY_INSERT [dbo].[OrderDetail] OFF
/****** Object: Table [dbo].[ItemInfo] ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[ItemInfo](
[OrderID] [int] NOT NULL,
[ItemData] [xml] NULL
) ON [PRIMARY]
GO
Nothing was inserted into the ItemInfo
table. That is because you can use that table to insert and update XML.
Before starting, however, spend a few minutes to talk about typed
versus untyped XML.