SQL Server 2012 : XML and the Relational Database - The xml Data Type (part 2) - XML Schema Definitions

7/23/2013 7:57:22 PM

2.3 XML Schema Definitions (XSDs)

One very important feature of XML is its ability to strongly type data in an XML document. The XSD language—itself composed in XML—defines the expected format for all XML documents validated against a particular XSD. You can use XSD to create an XML schema for your data, requiring that your data conform to a set of rules that you specify. This gives XML an advantage over just about all other data transfer/data description methods and is a major contributing factor to the success of the XML standard.

Without XSD, your XML data would just be another unstructured, text-delimited format. An XSD defines what your XML data should look like, what elements are required, and what data types those elements will have. Analogous to how a table definition in SQL Server provides structure and type validation for relational data, an XML schema provides structure and type validation for the XML data.

We won’t fully describe all the features of the XSD language here. You can find the XSD specifications at the World Wide Web Consortium (W3C), at http://www.w3.org/2001/XMLSchema. Several popular schemas are publicly available, including one for Really Simple Syndication (RSS), Atom Publishing Protocol (APP, based on RSS), which are protocols that power weblogs, blogcasts, and other forms of binary and text syndication, as well as one for SOAP, which dictates how XML Web Services exchange information.

You can choose how to structure your XSD. Your XSD can designate required elements and set limits on what data types and ranges are allowed. It can even allow document fragments.

SQL Server Schema Collections

SQL Server lets you create your own schemas and store them in the database as database objects, and to then enforce a schema on any XML instance, including columns in tables and SQL Server variables. This gives you precise control over the XML that is going into the database and lets you strongly type your XML instance.

To get started, you can create the following simple schema and add it to the schemas collection in AdventureWorks2012, as shown in Example 4.

Example 4. Creating an XML Schema Definition (XSD).

CREATE XML SCHEMA COLLECTION OrdersXSD AS '
  <xsd:schema
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
    <xsd:simpleType name="OrderAmountFloat" >
      <xsd:restriction base="xsd:float" >
        <xsd:minExclusive value="1.0" />
        <xsd:maxInclusive value="5000.0" />
     </xsd:restriction>
    </xsd:simpleType>
    <xsd:element name="Orders">
      <xsd:complexType>
        <xsd:sequence>
         <xsd:element name="Order">
            <xsd:complexType>
              <xsd:sequence>
                <xsd:element name="OrderId" type="xsd:int" />
                <xsd:element name="CustomerId" type="xsd:int" />
                <xsd:element name="OrderDate" type="xsd:dateTime" />
                <xsd:element name="OrderAmount" type="OrderAmountFloat" />
              </xsd:sequence>
            </xsd:complexType>
         </xsd:element>
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:schema>'

This schema is named OrdersXSD, and you can use it on any xml type, including variables, parameters, return values, and especially columns in tables. This schema defines elements named OrderId, CustomerId, OrderDate, and OrderAmount. The OrderAmount element references the OrderAmountFloat type, which is defined as a float data type whose minimum value is anything greater than (but not including) 1 and whose maximum value is 5000.

Next, create a simple table and apply the schema to the XML column by referring to the schema name in parentheses after your xml data type in the CREATE TABLE statement, as shown in Example 5.

Example 5. Creating a table with an xml column bound to an XML Schema Definition (XSD).

IF EXISTS(SELECT name FROM sys.tables WHERE name = 'OrdersXML' AND type = 'U')
 DROP TABLE OrdersXML

CREATE TABLE OrdersXML(
  OrdersId int PRIMARY KEY,
  OrdersDoc xml(OrdersXSD) NOT NULL)

As you can see in this example, the OrdersDoc column is defined not as simply xml, but as xml(OrdersXSD). The xml data type has an optional parameter that allows you to specify the bound schema. This same usage also applies if you want to bind a schema to another use of an xml data type, such as a variable or a parameter. SQL Server now allows only a strongly typed XML document in the OrdersDoc column. This is much better than a CHECK constraint (which you can still add to this column, but only with a function). An advantage of using an XML schema is that your data is validated against it and you can enforce xml data types (at the XML level) and make sure that only valid XML data is allowed into the particular elements. If you were using a CHECK constraint, for example, you would need a separate CHECK constraint for each validation you wanted to perform. In this example, without an XSD, several CHECK constraints would be needed just to enforce the minimum and maximum ages. You would need one constraint requiring the element and then another constraint to verify the allowed low end of the range and another one to verify the high end of the allowed range.

To see the schema in action, execute the code in Example 6.

Example 6. Validating XML data against an XSD.

-- Works because all XSD validations succeed
INSERT INTO OrdersXML VALUES(5, '
  <Orders>
    <Order>
      <OrderId>5</OrderId>
      <CustomerId>60</CustomerId>
      <OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
      <OrderAmount>25.90</OrderAmount>
    </Order>
  </Orders>')
GO

-- Won't work because 6.0 is not a valid int for CustomerId
UPDATE OrdersXML SET OrdersDoc = '
  <Orders>
    <Order>
      <OrderId>5</OrderId>
      <CustomerId>6.0</CustomerId>
      <OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
      <OrderAmount>25.9O</OrderAmount>
    </Order>
  </Orders>'
 WHERE OrdersId = 5
GO

-- Won't work because 25.9O uses an O for a 0 in the OrderAmount
UPDATE OrdersXML SET OrdersDoc = '
  <Orders>
    <Order>
      <OrderId>5</OrderId>
      <CustomerId>60</CustomerId>
      <OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
      <OrderAmount>25.9O</OrderAmount>
    </Order>
  </Orders>'
 WHERE OrdersId = 5
GO

-- Won't work because 5225.75 is too large a value for OrderAmount
UPDATE OrdersXML SET OrdersDoc = '
  <Orders>
    <Order>
      <OrderId>5</OrderId>
      <CustomerId>60</CustomerId>
      <OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
      <OrderAmount>5225.75</OrderAmount>
    </Order>
  </Orders>'
 WHERE OrdersId = 5
GO

SQL Server enforces the schema on inserts and updates, ensuring data integrity. The data provided for the INSERT operation at the top of Example 6 conforms to the schema, so the INSERT works just fine. Each of the three UPDATE statements that follow all attempt to violate the schema with various invalid data, and SQL Server rejects them with error messages that show the offending data (and location) that’s causing the problem:

Msg 6926, Level 16, State 1, Line 106
XML Validation: Invalid simple type value: '6.0'. Location: /*:Orders[1]/*:Order[1]/*:Cust
omer
Id[1]
Msg 6926, Level 16, State 1, Line 119
XML Validation: Invalid simple type value: '25.9O'. Location: /*:Orders[1]/*:Order[1]/*:Or
der
Amount[1]
Msg 6926, Level 16, State 1, Line 132
XML Validation: Invalid simple type value: '5225.75'. Location: /*:Orders[1]/*:Order[1]/*:
Order
Amount[1]

Lax Validation

XSD also supports lax validation. Say that you want to add an additional element to the XML from the preceding example, after <OrderAmt>, that is not part of the same schema. Schemas can use processContents values of skip and strict for any and anyAttribute declarations as a wildcard (if you’re unfamiliar with these schema attributes and values, they’re used to dictate how the XML parser should deal with XML elements not found in the schema). If processContents is set to skip, SQL Server will skip completely the validation of the additional element, even if a schema is available for it. If processContents is set to strict, SQL Server will require that it has an element or namespace defined in the current schema against which the element will be validated. Lax validation provides an additional “in-between” validation option. By setting the processContents attribute for this wildcard section to lax, you can enforce validation for any elements that have a schema associated with them but ignore any elements that are not defined in the schema.

Consider the schema you just worked with in Example 4. You can modify this XSD to tolerate additional elements after OrderAmount that are defined in another schema, whether or not that schema is available. A schema needs to be dropped before you can re-create a modified version of it, and objects bound to the schema must be dropped before you can drop the schema. Therefore, before re-creating the schema for lax validation, you must execute the following statements:

DROP TABLE OrdersXML
DROP XML SCHEMA COLLECTION OrdersXSD

Now re-create the XSD in Example 4 with one small difference—add the following additional line just after the last xsd:element line for OrderAmount:

<xsd:any namespace="##other" processContents="lax"/>

With this small change in place, arbitrary XML elements following <OrderAmt> will be allowed to be stored without failing validation, if the external XSD is not accessible. To see this in action, first re-create the same test table as shown in Example 5. Then run the code in Example 7, which inserts an order containing an additional <Notes> element not defined as part of the OrdersXSD schema.

Example 7. Using lax schema validation with XML data.

-- Works because all XSD validations succeed
INSERT INTO OrdersXML VALUES(6, '
  <Orders>
    <Order>
      <OrderId>6</OrderId>
      <CustomerId>60</CustomerId>
      <OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
      <OrderAmount>25.90</OrderAmount>
      <Notes xmlns="sf">My notes for this order</Notes>
    </Order>
  </Orders>')

Because of the processContents=“lax” setting in the XSD, SQL Server permits additional elements defined in another XSD (the sf namespace in this example, as denoted by the xmlns attribute). The lax setting in the XSD tells SQL Server to validate the <Notes> element in the XML using the sf namespace if available, but to allow the element without any validation if the sf namespace is not available.

Union and List Types

SQL Server also supports the union of lists with xsd:union, so you can combine multiple lists into one simple type. For example, in the schema shown in Example 8, the shiptypeList accepts strings such as FastShippers but also allows alternative integer values.

Example 8. Using union and list types in XSD.

-- Cleanup previous objects
DROP TABLE OrdersXML
DROP XML SCHEMA COLLECTION OrdersXSD
GO

-- Union and List types in XSD
CREATE XML SCHEMA COLLECTION OrdersXSD AS '
  <xsd:schema
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
    <xsd:simpleType name="shiptypeList">
      <xsd:union>
        <xsd:simpleType>
          <xsd:list>
            <xsd:simpleType>
              <xsd:restriction base="xsd:integer">
                <xsd:enumeration value="1" />
                <xsd:enumeration value="2" />
                <xsd:enumeration value="3" />
              </xsd:restriction>
            </xsd:simpleType>
          </xsd:list>
        </xsd:simpleType>
        <xsd:simpleType>
          <xsd:list>
            <xsd:simpleType>
              <xsd:restriction base="xsd:string">
                <xsd:enumeration value="FastShippers" />
                <xsd:enumeration value="SHL" />
                <xsd:enumeration value="PSU" />
              </xsd:restriction>
          </xsd:simpleType>
          </xsd:list>
        </xsd:simpleType>
      </xsd:union>
    </xsd:simpleType>
    <xsd:element name="Orders">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="Order">
            <xsd:complexType>
              <xsd:sequence>
                <xsd:element name="OrderId" type="xsd:int" />
                <xsd:element name="CustomerId" type="xsd:int" />
                <xsd:element name="OrderDate" type="xsd:dateTime" />
                <xsd:element name="OrderAmount" type="xsd:float" />
                <xsd:element name="ShipType" type="shiptypeList"/>
              </xsd:sequence>
            </xsd:complexType>
          </xsd:element>
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:schema>'

If you use this XSD to validate an XML instance with either a numeric value or a string value in the enumerated list, it will validate successfully, as demonstrated by the code in Example 9.

Example 9. Referencing an XSD list type in XML.

-- Works with 1 or FastShippers in ShipType
DECLARE @OrdersXML xml(OrdersXSD) = '
  <Orders>
    <Order>
      <OrderId>6</OrderId>
      <CustomerId>60</CustomerId>
      <OrderDate>2011-10-10T14:22:27.25-05:00</OrderDate>
      <OrderAmount>25.90</OrderAmount>
      <ShipType>1</ShipType>
    </Order>
  </Orders>'

This example is fairly basic, but it is useful if you have more than one way to describe something and need two lists to do so. One such possibility is metric and English units of measurement. This technique is useful when you need to restrict items and are writing them from a database.

2.4 XML Indexes

You can create an XML index on an XML column using almost the same syntax as for a standard SQL Server index. There are four types of XML indexes: a single primary XML index that must be created, and three types of optional secondary XML indexes that are created over the primary index. An XML index is a little different from a standard SQL index—it is a clustered index on an internal table used by SQL Server to store XML data. This table is called the node table and cannot be accessed by programmers.

To get started with an XML index, you must first create the primary index of all the nodes. The primary index is a clustered index (over the node table, not the base table) that associates each node of your XML column with the SQL Primary Key column. It does this by indexing one row in its internal representation (a B+ tree structure) for each node in your XML column, generating an index usually about three times as large as your XML data. For your XML data to work properly, your table must have an ordinary clustered primary key column defined. That primary key is used in a join of the XQuery results with the base table.

To create a primary XML index, you first create a table with a primary key and an XML column, as shown in Example 10.

Example 10. Creating a primary XML index for XML storage in a table.

IF EXISTS(SELECT name FROM sys.tables WHERE name = 'OrdersXML' AND type = 'U')
 DROP TABLE OrdersXML
GO

CREATE TABLE OrdersXML(
  OrdersId int PRIMARY KEY,
  OrdersDoc xml NOT NULL)

CREATE PRIMARY XML INDEX ix_orders
 ON OrdersXML(OrdersDoc)

These statements create a new primary XML index named ix_orders on the OrdersXML table’s OrdersDoc column. The primary XML index, ix_orders, now has the node table populated. To examine the node table’s columns, run the T-SQL shown in Example 11.

Example 11. Creating a primary XML index for XML storage in a table.

-- Display the columns in the node table (primary XML clustered index)
SELECT
  c.column_id, c.name, t.name AS data_type
 FROM
  sys.columns AS c
  INNER JOIN sys.indexes AS i ON i.object_id= c.object_id
  INNER JOIN sys.types AS t ON t.user_type_id= c.user_type_id
 WHERE
  i.name = 'ix_orders' AND i.type = 1
 ORDER BY
  c.column_id

The results are shown in Table 1.

Table 1. Columns in a Typical Node Table.

column_id	name	data_type
1	id	varbinary
2	nid	int
3	tagname	nvarchar
4	taguri	nvarchar
5	tid	int
6	value	sql_variant
7	lvalue	nvarchar
8	lvaluebin	varbinary
9	hid	varchar
10	xsinil	bit
11	xsitype	bit
12	pk1	int

The three types of secondary XML indexes are path, value, and property. You can implement a secondary XML index only after you have created a primary XML index because they are both actually indexes over the node table. These indexes further optimize XQuery statements made against the XML data.

A path index creates an index on the Path ID (hid in Table 1) and Value columns of the primary XML index, using the FOR PATH keyword. This type of index is best when you have a fairly complex document type and want to speed up XQuery XPath expressions that reference a particular node in your XML data with an explicit value . If you are more concerned about the values of the nodes queried with wildcards, you can create a value index using the FOR VALUE XML index. The VALUE index contains the same index columns as the PATH index, Value, and Path ID (hid), but in the reverse order (as shown in Table 1). Using the property type index with the PROPERTY keyword optimizes hierarchies of elements or attributes that are name/value pairs. The PROPERTY index contains the primary key of the base table, Path ID (hid), and Value, in that order. The syntax to create these indexes is shown here; you must specify that you are using the primary XML index by using the USING XML INDEX syntax as shown in Example 12.

Example 12. Creating secondary XML indexes on path, value, and property data.

-- Create secondary structural (path) XML index
CREATE XML INDEX ix_orders_path ON OrdersXML(OrdersDoc)
 USING XML INDEX ix_orders FOR PATH

-- Create secondary value XML index
CREATE XML INDEX ix_orders_val ON OrdersXML(OrdersDoc)
 USING XML INDEX ix_orders FOR VALUE

-- Create secondary property XML index
CREATE XML INDEX ix_orders_prop ON OrdersXML(OrdersDoc)
 USING XML INDEX ix_orders FOR PROPERTY

Be aware of these additional restrictions regarding XML indexes:

An XML index can contain only one XML column, so you cannot create a composite XML index (an index on more than one XML column).
Using XML indexes requires that the primary key be clustered, and because you can have only one clustered index per table, you cannot create a clustered XML index.

With the proper XML indexing in place, you can write some very efficient queries using XQuery. Before we get to XQuery, however, let’s take a look at some other XML features that will help you get XML data in and out of the database.

Others