2.3 XML Schema Definitions (XSDs)
One
very important feature of XML is its ability to strongly type data in
an XML document. The XSD language—itself composed in XML—defines the
expected format for all XML documents validated against a particular
XSD. You can use XSD to create an XML schema for your data, requiring
that your data conform to a set of rules that you specify. This gives
XML an advantage over just about all other data transfer/data
description methods and is a major contributing factor to the success of
the XML standard.
Without XSD, your XML data would just be another unstructured, text-delimited format. An XSD defines what your XML
data should look like, what elements are required, and what data types
those elements will have. Analogous to how a table definition in SQL
Server provides structure and type validation for relational data, an XML schema provides structure and type validation for the XML data.
We won’t fully describe all the features of the XSD language here. You can find the XSD specifications at the World Wide Web Consortium (W3C), at http://www.w3.org/2001/XMLSchema.
Several popular schemas are publicly available, including one for
Really Simple Syndication (RSS), Atom Publishing Protocol (APP, based on
RSS), which are protocols that power weblogs, blogcasts, and other
forms of binary and text syndication, as well as one for SOAP, which
dictates how XML Web Services exchange information.
You can choose
how to structure your XSD. Your XSD can designate required elements and
set limits on what data types and ranges are allowed. It can even allow
document fragments.
SQL Server Schema Collections
SQL
Server lets you create your own schemas and store them in the database
as database objects, and to then enforce a schema on any XML instance,
including columns in tables and SQL Server variables. This gives you
precise control over the XML that is going into the database and lets
you strongly type your XML instance.
To get started, you can create the following simple schema and add it to the schemas collection in AdventureWorks2012, as shown in Example 4.
Example 4. Creating an XML Schema Definition (XSD).
CREATE XML SCHEMA COLLECTION OrdersXSD AS '
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
<xsd:simpleType name="OrderAmountFloat" >
<xsd:restriction base="xsd:float" >
<xsd:minExclusive value="1.0" />
<xsd:maxInclusive value="5000.0" />
</xsd:restriction>
</xsd:simpleType>
<xsd:element name="Orders">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Order">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="OrderId" type="xsd:int" />
<xsd:element name="CustomerId" type="xsd:int" />
<xsd:element name="OrderDate" type="xsd:dateTime" />
<xsd:element name="OrderAmount" type="OrderAmountFloat" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>'
This schema is named OrdersXSD, and you can use it on any xml type, including variables, parameters, return values, and especially columns in tables. This schema defines elements named OrderId, CustomerId, OrderDate, and OrderAmount. The OrderAmount element references the OrderAmountFloat type, which is defined as a float data type whose minimum value is anything greater than (but not including) 1 and whose maximum value is 5000.
Next, create a simple table and apply the schema to the XML column by referring to the schema name in parentheses after your xml data type in the CREATE TABLE statement, as shown in Example 5.
Example 5. Creating a table with an xml column bound to an XML Schema Definition (XSD).
IF EXISTS(SELECT name FROM sys.tables WHERE name = 'OrdersXML' AND type = 'U')
DROP TABLE OrdersXML
CREATE TABLE OrdersXML(
OrdersId int PRIMARY KEY,
OrdersDoc xml(OrdersXSD) NOT NULL)
As you can see in this example, the OrdersDoc column is defined not as simply xml, but as xml(OrdersXSD). The xml
data type has an optional parameter that allows you to specify the
bound schema. This same usage also applies if you want to bind a schema
to another use of an xml data type, such as a variable or a parameter. SQL Server now allows only a strongly typed XML document in the OrdersDoc column. This is much better than a CHECK
constraint (which you can still add to this column, but only with a
function). An advantage of using an XML schema is that your data is
validated against it and you can enforce xml
data types (at the XML level) and make sure that only valid XML data is
allowed into the particular elements. If you were using a CHECK constraint, for example, you would need a separate CHECK constraint for each validation you wanted to perform. In this example, without an XSD, several CHECK
constraints would be needed just to enforce the minimum and maximum
ages. You would need one constraint requiring the element and then
another constraint to verify the allowed low end of the range and
another one to verify the high end of the allowed range.
To see the schema in action, execute the code in Example 6.
Example 6. Validating XML data against an XSD.
-- Works because all XSD validations succeed
INSERT INTO OrdersXML VALUES(5, '
<Orders>
<Order>
<OrderId>5</OrderId>
<CustomerId>60</CustomerId>
<OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
<OrderAmount>25.90</OrderAmount>
</Order>
</Orders>')
GO
-- Won't work because 6.0 is not a valid int for CustomerId
UPDATE OrdersXML SET OrdersDoc = '
<Orders>
<Order>
<OrderId>5</OrderId>
<CustomerId>6.0</CustomerId>
<OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
<OrderAmount>25.9O</OrderAmount>
</Order>
</Orders>'
WHERE OrdersId = 5
GO
-- Won't work because 25.9O uses an O for a 0 in the OrderAmount
UPDATE OrdersXML SET OrdersDoc = '
<Orders>
<Order>
<OrderId>5</OrderId>
<CustomerId>60</CustomerId>
<OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
<OrderAmount>25.9O</OrderAmount>
</Order>
</Orders>'
WHERE OrdersId = 5
GO
-- Won't work because 5225.75 is too large a value for OrderAmount
UPDATE OrdersXML SET OrdersDoc = '
<Orders>
<Order>
<OrderId>5</OrderId>
<CustomerId>60</CustomerId>
<OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
<OrderAmount>5225.75</OrderAmount>
</Order>
</Orders>'
WHERE OrdersId = 5
GO
SQL Server enforces the schema on inserts and updates, ensuring data integrity. The data provided for the INSERT operation at the top of Example 6 conforms to the schema, so the INSERT works just fine. Each of the three UPDATE
statements that follow all attempt to violate the schema with various
invalid data, and SQL Server rejects them with error messages that show
the offending data (and location) that’s causing the problem:
Msg 6926, Level 16, State 1, Line 106
XML Validation: Invalid simple type value: '6.0'. Location: /*:Orders[1]/*:Order[1]/*:Cust
omer
Id[1]
Msg 6926, Level 16, State 1, Line 119
XML Validation: Invalid simple type value: '25.9O'. Location: /*:Orders[1]/*:Order[1]/*:Or
der
Amount[1]
Msg 6926, Level 16, State 1, Line 132
XML Validation: Invalid simple type value: '5225.75'. Location: /*:Orders[1]/*:Order[1]/*:
Order
Amount[1]
XSD also supports lax validation. Say that you want to add an additional element to the XML from the preceding example, after <OrderAmt>, that is not part of the same schema. Schemas can use processContents values of skip and strict for any and anyAttribute
declarations as a wildcard (if you’re unfamiliar with these schema
attributes and values, they’re used to dictate how the XML parser should
deal with XML elements not found in the schema). If processContents is set to skip, SQL Server will skip completely the validation of the additional element, even if a schema is available for it. If processContents is set to strict,
SQL Server will require that it has an element or namespace defined in
the current schema against which the element will be validated. Lax
validation provides an additional “in-between” validation option. By
setting the processContents attribute for this wildcard section to lax,
you can enforce validation for any elements that have a schema
associated with them but ignore any elements that are not defined in the
schema.
Consider the schema you just worked with in Example 4. You can modify this XSD to tolerate additional elements after OrderAmount
that are defined in another schema, whether or not that schema is
available. A schema needs to be dropped before you can re-create a
modified version of it, and objects bound to the schema must be dropped
before you can drop the schema. Therefore, before re-creating the schema
for lax validation, you must execute the following statements:
DROP TABLE OrdersXML
DROP XML SCHEMA COLLECTION OrdersXSD
Now re-create the XSD in Example 4 with one small difference—add the following additional line just after the last xsd:element line for OrderAmount:
<xsd:any namespace="##other" processContents="lax"/>
With this small change in place, arbitrary XML elements following <OrderAmt>
will be allowed to be stored without failing validation, if the
external XSD is not accessible. To see this in action, first re-create
the same test table as shown in Example 5. Then run the code in Example 7, which inserts an order containing an additional <Notes> element not defined as part of the OrdersXSD schema.
Example 7. Using lax schema validation with XML data.
-- Works because all XSD validations succeed
INSERT INTO OrdersXML VALUES(6, '
<Orders>
<Order>
<OrderId>6</OrderId>
<CustomerId>60</CustomerId>
<OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
<OrderAmount>25.90</OrderAmount>
<Notes xmlns="sf">My notes for this order</Notes>
</Order>
</Orders>')
Because of the processContents=“lax” setting in the XSD, SQL Server permits additional elements defined in another XSD (the sf namespace in this example, as denoted by the xmlns attribute). The lax setting in the XSD tells SQL Server to validate the <Notes> element in the XML using the sf namespace if available, but to allow the element without any validation if the sf namespace is not available.
SQL Server also supports the union of lists with xsd:union, so you can combine multiple lists into one simple type. For example, in the schema shown in Example 8, the shiptypeList accepts strings such as FastShippers but also allows alternative integer values.
Example 8. Using union and list types in XSD.
-- Cleanup previous objects
DROP TABLE OrdersXML
DROP XML SCHEMA COLLECTION OrdersXSD
GO
-- Union and List types in XSD
CREATE XML SCHEMA COLLECTION OrdersXSD AS '
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
<xsd:simpleType name="shiptypeList">
<xsd:union>
<xsd:simpleType>
<xsd:list>
<xsd:simpleType>
<xsd:restriction base="xsd:integer">
<xsd:enumeration value="1" />
<xsd:enumeration value="2" />
<xsd:enumeration value="3" />
</xsd:restriction>
</xsd:simpleType>
</xsd:list>
</xsd:simpleType>
<xsd:simpleType>
<xsd:list>
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="FastShippers" />
<xsd:enumeration value="SHL" />
<xsd:enumeration value="PSU" />
</xsd:restriction>
</xsd:simpleType>
</xsd:list>
</xsd:simpleType>
</xsd:union>
</xsd:simpleType>
<xsd:element name="Orders">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Order">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="OrderId" type="xsd:int" />
<xsd:element name="CustomerId" type="xsd:int" />
<xsd:element name="OrderDate" type="xsd:dateTime" />
<xsd:element name="OrderAmount" type="xsd:float" />
<xsd:element name="ShipType" type="shiptypeList"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>'
If you use this XSD to validate an XML
instance with either a numeric value or a string value in the
enumerated list, it will validate successfully, as demonstrated by the
code in Example 9.
Example 9. Referencing an XSD list type in XML.
-- Works with 1 or FastShippers in ShipType
DECLARE @OrdersXML xml(OrdersXSD) = '
<Orders>
<Order>
<OrderId>6</OrderId>
<CustomerId>60</CustomerId>
<OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
<OrderAmount>25.90</OrderAmount>
<ShipType>1</ShipType>
</Order>
</Orders>'
This example is fairly basic, but
it is useful if you have more than one way to describe something and
need two lists to do so. One such possibility is metric and English
units of measurement. This technique is useful when you need to restrict
items and are writing them from a database.
You
can create an XML index on an XML column using almost the same syntax
as for a standard SQL Server index. There are four types of XML indexes: a single primary XML index that must be created, and three types of optional secondary XML indexes
that are created over the primary index. An XML index is a little
different from a standard SQL index—it is a clustered index on an
internal table used by SQL Server to store XML data. This table is
called the node table and cannot be accessed by programmers.
To
get started with an XML index, you must first create the primary index
of all the nodes. The primary index is a clustered index (over the node
table, not the base table) that associates each node of your XML column
with the SQL Primary Key column. It does this by indexing one row in its
internal representation (a B+ tree structure) for each node in your XML
column, generating an index usually about three times as large as your
XML data. For your XML data to work properly, your table must have an
ordinary clustered primary key column defined. That primary key is used
in a join of the XQuery results with the base table.
To create a primary XML index, you first create a table with a primary key and an XML column, as shown in Example 10.
Example 10. Creating a primary XML index for XML storage in a table.
IF EXISTS(SELECT name FROM sys.tables WHERE name = 'OrdersXML' AND type = 'U')
DROP TABLE OrdersXML
GO
CREATE TABLE OrdersXML(
OrdersId int PRIMARY KEY,
OrdersDoc xml NOT NULL)
CREATE PRIMARY XML INDEX ix_orders
ON OrdersXML(OrdersDoc)
These statements create a new primary XML index named ix_orders on the OrdersXML table’s OrdersDoc column. The primary XML index, ix_orders, now has the node table populated. To examine the node table’s columns, run the T-SQL shown in Example 11.
Example 11. Creating a primary XML index for XML storage in a table.
-- Display the columns in the node table (primary XML clustered index)
SELECT
c.column_id, c.name, t.name AS data_type
FROM
sys.columns AS c
INNER JOIN sys.indexes AS i ON i.object_id= c.object_id
INNER JOIN sys.types AS t ON t.user_type_id= c.user_type_id
WHERE
i.name = 'ix_orders' AND i.type = 1
ORDER BY
c.column_id
The results are shown in Table 1.
Table 1. Columns in a Typical Node Table.
column_id | name | data_type |
---|
1 | id | varbinary |
2 | nid | int |
3 | tagname | nvarchar |
4 | taguri | nvarchar |
5 | tid | int |
6 | value | sql_variant |
7 | lvalue | nvarchar |
8 | lvaluebin | varbinary |
9 | hid | varchar |
10 | xsinil | bit |
11 | xsitype | bit |
12 | pk1 | int |
The three types of secondary XML indexes are path, value, and property. You can implement a secondary XML
index only after you have created a primary XML index because they are
both actually indexes over the node table. These indexes further
optimize XQuery statements made against the XML data.
A path index creates an index on the Path ID (hid in Table 1) and Value columns of the primary XML index, using the FOR PATH
keyword. This type of index is best when you have a fairly complex
document type and want to speed up XQuery XPath expressions that
reference a particular node in your XML data with an explicit value . If you are more concerned about the values of
the nodes queried with wildcards, you can create a value index using the
FOR VALUE XML index. The VALUE index contains the same index columns as the PATH index, Value, and Path ID (hid), but in the reverse order (as shown in Table 1). Using the property type index with the PROPERTY keyword optimizes hierarchies of elements or attributes that are name/value pairs. The PROPERTY index contains the primary key of the base table, Path ID (hid), and Value, in that order. The syntax to create these indexes is shown here; you must specify that you are using the primary XML index by using the USING XML INDEX syntax as shown in Example 12.
Example 12. Creating secondary XML indexes on path, value, and property data.
-- Create secondary structural (path) XML index
CREATE XML INDEX ix_orders_path ON OrdersXML(OrdersDoc)
USING XML INDEX ix_orders FOR PATH
-- Create secondary value XML index
CREATE XML INDEX ix_orders_val ON OrdersXML(OrdersDoc)
USING XML INDEX ix_orders FOR VALUE
-- Create secondary property XML index
CREATE XML INDEX ix_orders_prop ON OrdersXML(OrdersDoc)
USING XML INDEX ix_orders FOR PROPERTY
Be aware of these additional restrictions regarding XML indexes:
An XML index can contain only one XML column, so you cannot create a composite XML index (an index on more than one XML column).
Using
XML indexes requires that the primary key be clustered, and because you
can have only one clustered index per table, you cannot create a
clustered XML index.
With the proper XML indexing
in place, you can write some very efficient queries using XQuery. Before
we get to XQuery, however, let’s take a look at some other XML features
that will help you get XML data in and out of the database.