SQL Server 2005 : Enhancing Service Broker Scale-Out with SQLCLR

1/31/2012 6:24:22 PM

Service Broker is frequently mentioned as an excellent choice for helping to scale out database services. One of the more compelling use cases is a Service Broker service that can be used to asynchronously request data from a remote system. In such a case, a request message would be sent to the remote data service from a local stored procedure, which could do some other work while waiting for the response—the requested data—to come back.

There are many ways to architect such a system, and given that Service Broker allows messages to be sent either as binary or XML, I wondered which would provide the best overall performance and value from a code reuse perspective.

I started working with the AdventureWorks.HumanResources.Employee table as a sample data set, imagining a remote data service requesting a list of employees along with their attributes. After some experimentation, I determined that the FOR XML RAW option is the easiest way to serialize a table in XML format, and I used the ROOT option to make the XML valid:

SELECT *
FROM HumanResources.Employee
FOR XML RAW, ROOT('Employees')

XML is, of course, known to be an extremely verbose data interchange format, and I was not surprised to discover that the data size of the resultant XML is 116KB, despite the fact that the HumanResources.Employee table itself has only 56KB of data. I experimented with setting shorter column names, but it had very little effect on the size and created what I feel to be unmaintainable code.

My first performance test, the results of which are shown in Figure 1 , was not especially promising. Simply serializing the results was taking over 3 seconds per iteration. After some trial and error, I discovered that the TYPE option hugely improved performance, bringing average time per iteration down by over 50%, as shown in Figure 2.

Figure 1 . Initial performance test of XML serialization

Figure 2. XML serialization performed better using the TYPE directive.

I was quite pleased with these results until I decided to test deserialization. The first problem was the code required to deserialize the XML back into a table. In order to get back the same table I started with, I had to explicitly define every column for the result set; this made the code quite a bit more complex than I'd hoped for:

SELECT
   col.value('@EmployeeID', 'int') AS EmployeeID,
   col.value('@NationalIDNumber', 'nvarchar(15)') AS NationalIDNumber,
   col.value('@ContactID', 'int') AS ContactID,
   col.value('@LoginID', 'nvarchar(256)') AS LoginID,
   col.value('@ManagerID', 'int') AS ManagerID,
   col.value('@Title', 'nvarchar(50)') AS Title,
   col.value('@BirthDate', 'datetime') AS BirthDate,
   col.value('@MaritalStatus', 'nchar(1)') AS MaritalStatus,
   col.value('@Gender', 'nchar(1)') AS Gender,
   col.value('@HireDate', 'datetime') AS HireDate,
   col.value('@SalariedFlag', 'bit') AS SalariedFlag,
   col.value('@VacationHours', 'smallint') AS VacationHours,
   col.value('@SickLeaveHours', 'smallint') AS SickLeaveHours,
   col.value('@CurrentFlag', 'bit') AS CurrentFlag,
   col.value('@rowguid', 'uniqueidentifier') AS rowguid,
   col.value('@ModifiedDate', 'datetime') AS ModifiedDate
FROM @p.nodes ('/Employees/row') p (col)

The next problem was performance. As shown in Figure 3 , when I tested deserializing the XML, performance went from pretty good to downright abysmal.

Figure 3. XML deserialization performance leaves much to be desired.

I decided to investigate SQLCLR options for solving the problem, focusing on both reuse potential and performance. My first thought was to return binary serialized DataTables, and in order to make that happen, I needed a way to return binary-formatted data from my CLR routines. This of course called for .NET's BinaryFormatter class, so I created a class called serialization_helper. The following code was cataloged in an EXTERNAL_ACCESS assembly (required for System.IO access):

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Security.Permissions;
using System.Runtime.Serialization.Formatters.Binary;

public partial class serialization_helper
{
    public static byte[] getBytes(object o)
    {
        SecurityPermission sp =
            new SecurityPermission(
                SecurityPermissionFlag.SerializationFormatter);
        sp.Assert();

        BinaryFormatter bf = new BinaryFormatter();

        using (System.IO.MemoryStream ms =
            new System.IO.MemoryStream())

{
            bf.Serialize(ms, o);

            return(ms.ToArray());
        }
    }

    public static object getObject(byte[] theBytes)
    {
        using (System.IO.MemoryStream ms =
            new System.IO.MemoryStream(theBytes, false))
        {
            return(getObject(ms));
        }
    }

    public static object getObject(System.IO.Stream s)
    {
        SecurityPermission sp =
            new SecurityPermission(
                SecurityPermissionFlag.SerializationFormatter);
        sp.Assert();

        BinaryFormatter bf = new BinaryFormatter();

        return (bf.Deserialize(s));
    }
};

Use of this class is fairly straightforward: to serialize an object, pass it into the getBytes method. This method first uses an assertion to allow SAFE callers to use it, and then uses the binary formatter to serialize the object to a Stream. The stream is then returned as a collection of bytes. Deserialization can be done using either overload of the getObject method. I found that depending on the scenario, I might have ready access to either a Stream or a collection of bytes, so creating both overloads made sense instead of duplicating code to produce one from the other. Deserialization also uses an assertion before running, in order to allow calling code to be cataloged as SAFE.

My first shot at getting the data was to simply load the input set into a DataTable and run it through the serialization_helper methods. The following code implements a UDF called GetDataTable_Binary, which uses this logic:

[Microsoft.SqlServer.Server.SqlFunction(
    DataAccess = DataAccessKind.Read)]
public static SqlBytes GetDataTable_Binary(string query)
{
    SqlConnection conn =
        new SqlConnection("context connection = true;");

    SqlCommand comm = new SqlCommand();
    comm.Connection = conn;
    comm.CommandText = query;

    SqlDataAdapter da = new SqlDataAdapter();
    da.SelectCommand = comm;

    DataTable dt = new DataTable();
    da.Fill(dt);

    //Serialize and return the output
    return new SqlBytes(
        serialization_helper.getBytes(dt));
}

This method is used by passing in a query for the table that you'd like to get back in binary serialized form, as in the following example:

USE AdventureWorks
GO

DECLARE @sql NVARCHAR(4000)
SET @sql = 'SELECT * FROM HumanResources.Employee'

DECLARE @p VARBINARY(MAX)
SET @p =
    dbo.GetDataTable_Binary(@sql)

While I'd achieved the reuse potential I hoped for—this function can be used for any number of queries—I was disappointed to find that the output data size had ballooned to 232KB. Things looked even worse when I ran a performance test and serialization speed turned out to be dismal at best, as shown in Figure 4.

Figure 4. Performance of binary serializing DataTables is far from adequate.

The main problem, as it turned out, was the default serialization behavior of the DataTable. Even when using the BinaryFormatter, a DataTable serializes itself first to XML, and then to binary—double the work that I expected. To fix this, set the RemotingFormat property of the DataTable to Binary before serialization:

dt.RemotingFormat = SerializationFormat.Binary;

Making this change resulted in much better performance, as illustrated by the test results shown in Figure 5.

Figure 5 . By setting the RemotingFormat on the DataTable before serialization, performance is greatly improved.

I still felt that I could do better, and after several more attempts that I won't bore you with the details of, I decided to forgo the DataTable altogether and focus on a class that I've found historically to be much faster: SqlDataReader. I worked on pulling the data out into object collections, and initial tests that I ran showed the data size to be much closer to what I expected. In addition to size improvements, serialization performance turned out to be far better than that of the DataTable (but not as good as XML serialization with the TYPE directive).

The advantage of a DataTable is that it's one easy-to-use unit that contains all of the data, as well as the metadata. You don't have to be concerned with column names, types, and sizes, as everything is automatically loaded into the DataTable for you. Working with a SqlDataReader requires a bit more work, since it can't be serialized as a single unit, but must instead be split up into its component parts.

Since the code I implemented is somewhat complex, I will walk you through it section by section. To begin with, I set the DataAccessKind.Read property on the SqlFunctionAttribute, in order to allow the method to access data via the context connection. A generic List is instantiated, which will hold one object collection per row of data, in addition to one for the metadata. Finally, the SqlConnection is instantiated, and the SqlCommand set up and executed:

[Microsoft.SqlServer.Server.SqlFunction(
    DataAccess = DataAccessKind.Read)]
public static SqlBytes GetBinaryFromQueryResult(string query)
{
    List<object[]> theList = new List<object[]>();

    using (SqlConnection conn =
        new SqlConnection("context connection = true;"))
    {
        SqlCommand comm = new SqlCommand();
        comm.Connection = conn;
        comm.CommandText = query;

        conn.Open();

        SqlDataReader read = comm.ExecuteReader();

The next step is to pull the metadata for each column out of the SqlDataReader. A method called GetSchemaTable is used to return a DataTable populated with one row per column. The available fields are documented in the MSDN Library, but I'm using the most common of them in the code that follows. After populating the object collection with the metadata, it is added to the output List:

DataTable dt = read.GetSchemaTable();

        //Populate the field list from the schema table
        object[] fields = new object[dt.Rows.Count];
        for (int i = 0; i < fields.Length; i++)
        {
            object[] field = new object[5];
            field[0] = dt.Rows[i]["ColumnName"];
            field[1] = dt.Rows[i]["ProviderType"];

field[2] = dt.Rows[i]["ColumnSize"];
            field[3] = dt.Rows[i]["NumericPrecision"];
            field[4] = dt.Rows[i]["NumericScale"];

            fields[i] = field;
        }

        //Add the collection of fields to the output list
        theList.Add(fields);

Finally, the code loops over the rows returned by the query, using the GetValues method to pull each row out into an object collection that is added to the output. The List is converted into an array of object[] (object[][], to be more precise), which is serialized and returned to the caller.

//Add all of the rows to the output list
        while (read.Read())
        {
            object[] o = new object[read.FieldCount];
            read.GetValues(o);
            theList.Add(o);
        }
    }

    //Serialize and return the output
    return new SqlBytes(
        serialization_helper.getBytes(theList.ToArray()));
}

Once this function is created, calling it is almost identical to calling GetDataTable_Binary:

USE AdventureWorks
GO

DECLARE @sql NVARCHAR(4000)
SET @sql = 'SELECT * FROM HumanResources.Employee'

DECLARE @p VARBINARY(MAX)
SET @p =
    dbo.GetBinaryFromQueryResult(@sql)

The result: 57KB worth of binary data—quite an improvement over both the XML and DataTable methods. If using this to transfer data between Broker instances on remote servers, the decrease in network traffic can make a big difference. The serialization performance test, the results of which are shown in Figure 6 , showed that performance is vastly improved over the DataTable attempt, while not as good as XML serialization in conjunction with the TYPE directive.

Figure 6. Performance of binary serializing object collections derived from a SqlDataReader is much better than the DataTable equivalent.

Pleased with these results, I decided to go ahead with deserialization. Continuing with my stress on reuse potential, I decided that a stored procedure would be a better choice for a UDF. A stored procedure does not have a fixed output as does a UDF, so any input table can be deserialized and returned without worrying about violating column list contracts.

The first part of the stored procedure follows:

[Microsoft.SqlServer.Server.SqlProcedure]
public static void GetTableFromBinary(SqlBytes theTable)
{
    //Deserialize the input
    object[] dt = (object[])(
        serialization_helper.getObject(theTable.Value));

    //First, get the fields
    object[] fields = (object[])(dt[0]);
    SqlMetaData[] cols = new SqlMetaData[fields.Length];

    //Loop over the fields and populate SqlMetaData objects
    for (int i = 0; i<fields.Length; i++)
    {
        object[] field = (object[])(fields[i]);
        SqlDbType dbType = (SqlDbType)field[1];

After deserializing the input bytes back into a collection of objects, the first item in the collection—which is assumed to be the column metadata—is converted into a collection of objects. This collection is looped over item-by-item in order to create the output SqlMetaData objects that will be used to stream back the data to the caller.

The trickiest part of setting this up is the fact that each SQL Server data type requires a different SqlMetaData overload. DECIMAL needs a precision and scale setting; character and binary types need a size; and for other types, size, precision, and scale are all inappropriate inputs. The following switch statement handles creation of the SqlMetaData instances:

//Different SqlMetaData overloads are required
        //depending on the data type
        switch (dbType)
        {
            case SqlDbType.Decimal:
                cols[i] = new SqlMetaData(
                    (string)field[0],
                    dbType,
                    (byte)field[3],
                    (byte)field[4]);
                break;
            case SqlDbType.Binary:
            case SqlDbType.Char:
            case SqlDbType.NChar:
            case SqlDbType.NVarChar:
            case SqlDbType.VarBinary:
            case SqlDbType.VarChar:
                switch ((int)field[2])
                {
                    //If it's a MAX type, use −1 as the size
                    case 2147483647:
                        cols[i] = new SqlMetaData(
                            (string)field[0],
                            dbType,
                            −1);
                        break;
                    default:
                        cols[i] = new SqlMetaData(
                            (string)field[0],
                            dbType,
                            (long)((int)field[2]));
                        break;
                }
                break;
            default:
                cols[i] = new SqlMetaData(
                    (string)field[0],
                    dbType);
                break;
        }
    }

Once population of the columns collection has been completed, the data can be sent back to the caller using the SqlPipe class's SendResults methods. After starting the stream, the remainder of the objects in the input collection are looped over, cast to object[], and sent back as SqlDataRecords:

//Start the result stream
    SqlDataRecord rec = new SqlDataRecord(cols);
    SqlContext.Pipe.SendResultsStart(rec);

    for (int i = 1; i < dt.Length; i++)
    {
        rec.SetValues((object[])dt[i]);
        SqlContext.Pipe.SendResultsRow(rec);
    }

    //End the result stream
    SqlContext.Pipe.SendResultsEnd();
}

Although the serialization test had not yielded spectacular results, it turns out that deserialization of data prepared in this manner is exceptionally fast compared with the alternatives. The performance test, the results of which are shown in Figure 7, revealed that deserialization of the SqlDataReader data is almost an order of magnitude faster than deserialization of similar XML. Although the serialization is slightly slower, I feel that the combination of better network utilization and much faster deserialization makes this a great technique for transferring tabular data between Service Broker instances in scale-out and distributed processing scenarios.

Figure 7. Deserializitation of the SqlDataReader data has much less overhead than deserialization of XML.

Others