SQL Server 2008 R2 : Creating and Managing User-Defined Functions (part 6) - Rewriting Stored Procedures as Functions, Creating and Using CLR Functions

4/9/2013 9:29:42 PM

4. Rewriting Stored Procedures as Functions

In releases of SQL Server prior to SQL Server 2000, if you wanted to do custom processing within SQL code, your only real option was to create stored procedures to do things that often would have worked much better as functions. For example, you couldn’t use the result set of a stored procedure in a WHERE clause or to return a value as a column in a select list. Using a stored procedure to perform calculations on columns in a result set often required using a cursor to step through each row in a result set and pass the column values fetched, one at a time, to the stored procedure as parameters. This procedure then typically returned the computed value via an output parameter, which had to be mapped to another local variable. Another alternative was to retrieve the initial result set into a temporary table and then perform additional queries or updates against the temporary table to modify the column values, which often required multiple passes. Neither of these methods was an efficient means of processing the data, but prior to SQL Server 2000, few alternatives existed. If you needed to join against the result set of a stored procedure, you had to insert the result set into a temporary table first and then join against the temporary table, as shown in the following code fragment:

...
insert #results exec result_proc
select * from other_Table
   join #results on other_table.pkey = #results.keyfield
...

Now that SQL Server supports user-defined functions, you might want to consider rewriting some of your old stored procedures as functions to take advantage of the capabilities of functions and improve the efficiency of your SQL code. You mainly want to do this in situations in which you would like to be able to invoke a stored procedure directly from within a query. If the stored procedure returns a result set, it is a candidate for being written as a table-valued function. If it returns a scalar value, usually via an output parameter, it is a candidate for being written as a scalar function. However, the following criteria also are indications that a procedure is a good candidate for being rewritten as a function:

The procedure logic is expressible in a single SELECT statement; however, it is written as a stored procedure, rather than a view, because of the need for it to be parameter driven.
The stored procedure does not perform update operations on tables, except against table variables.
There are no dynamic SQL statements executed via the EXECUTE statement or sp_executesql.
The stored procedure returns no more than a single result set.
If the stored procedure returns a result set, its primary purpose is to build an intermediate result that is typically loaded into a temporary table, which is then queried in a SELECT statement.

The result_proc stored procedure, used earlier in this section, could possibly be rewritten as a table-valued function called fn_result(). The preceding code fragment could then be rewritten as follows:

SELECT *
    FROM fn_results() fn
    join other_table o.pkey = fn.keyfield

5. Creating and Using CLR Functions

Prior to SQL Server 2005, the only way to extend the functionality of SQL Server beyond what was available using the T-SQL language was to create extended stored procedures or Component Object Model (COM) components. The main problem with these types of extensions was that if not written very carefully, they could have an adverse impact on the reliability and security of SQL Server. For example, extended stored procedures are implemented as DLLs that run in the same memory space as SQL Server. An access violation raised in a poorly written extended stored procedure could crash SQL Server itself.

In addition, neither extended stored procedures nor COM components allow you to create custom user-defined functions that can be written in any programming language other than T-SQL, which has a limited command set for operations such as complex string comparison and manipulation and complex numeric computations.

In SQL Server 2008, you can write custom user-defined functions in any Microsoft .NET Framework programming language, such as Microsoft Visual Basic .NET or Microsoft Visual C#. SQL Server supports both scalar and table-valued CLR functions, as well as CLR user-defined aggregate functions. These extensions written in the CLR are much more secure and reliable than extended stored procedures or COM components.

Note

The CLR function examples presented in the following sections are provided as illustrations only. The sample code will not execute successfully because the underlying CLR assemblies have not been provided.

Adding CLR Functions to a Database

If you’ve already created and compiled a CLR function, your next task is to install that CLR function in the database. The first step in this process is to copy the .NET assembly to a location that SQL Server can access, and then you need to load it into SQL Server by creating an assembly. The syntax for the CREATE ASSEMBLY command is as follows:

CREATE ASSEMBLY AssemblyName [AUTHORIZATION Owner_name]
FROM  { <client_assembly_specifier> | <assembly_bits> [ ,...n ] }
[WITH PERMISSION_SET = (SAFE | EXTERNAL_ACCESS | UNSAFE) ]

AssemblyName is the name of the assembly. client_assembly_specifier specifies the local path or network location where the assembly being uploaded is located, and also the manifest filename that corresponds to the assembly. It can be expressed as a fixed string or an expression evaluating to a fixed string, with variables. The path can be a local path, but often the path is a network share. assembly_bits is the list of binary values that make up the assembly and its dependent assemblies.

The WITH clause is optional, and it defaults to SAFE. Marking an assembly with the SAFE permission set indicates that no external resources (for example, the Registry, Web services, file I/O) are going to be accessed.

The CREATE ASSEMBLY command fails if it is marked as SAFE and assemblies like System.IO are referenced. Also, if anything causes a permission demand for executing similar operations, an exception is thrown at runtime.

Marking an assembly with the EXTERNAL_ACCESS permission set tells SQL Server that it will use resources such as networking, files, and so forth. Assemblies such as System.Web.Services (but not System.Web) can be referenced with this set. To create an EXTERNAL_ACCESS assembly, the creator must have EXTERN ACCESS_permission.

Marking an assembly with the UNSAFE permission set tells SQL Server that not only might external resources be used, but unmanaged code may be invoked from managed code. An UNSAFE assembly can potentially undermine the security of either SQL Server or the CLR. Only members of the sysadmin role can create UNSAFE assemblies.

After the assembly is created, the next step is to associate the method within the assembly with a user-defined function. You do this with the CREATE FUNCTION command, using the following syntax:

CREATE FUNCTION [ schema_name. ] function_name
    ( [ { @parameter_name [AS] [ schema_name.]scalar_datatype [ = default ] }
      [ ,...n ] ] )
RETURNS { return_data_type | TABLE ( { column_name data_type } [ ,...n ] ) }
[ WITH { [ , RETURNS NULL ON NULL INPUT | CALLED ON NULL INPUT ]
         [ , EXECUTE_AS_Clause ] } ]
[ AS ] EXTERNAL NAME assembly_name.class_name.method_name

After creating the CLR function successfully, you can use it just as you would a T-SQL function. The following example shows how to manually deploy a table-valued CLR function:

CREATE ASSEMBLY fn_EventLog
FROM 'F:\assemblies\fn_EventLog\fn_eventlog.dll'
WITH PERMISSION_SET = SAFE
GO
CREATE FUNCTION ShowEventLog(@logname nvarchar(100))
RETURNS TABLE (logTime datetime,
               Message nvarchar(4000),
               Category nvarchar(4000),
               InstanceId bigint)
AS
EXTERNAL NAME fn_EventLog.TabularEventLog.InitMethod
GO
SELECT * FROM dbo.ReadEventLog(N'System') as T
go

Note

The preceding examples show the steps involved in manually registering an assembly and creating a CLR function. If you use Visual Studio’s Deploy feature, the CREATE/ALTER ASSEMBLY and CREATE FUNCTION commands are issued automatically by Visual Studio.

Deciding Between Using T-SQL or CLR Functions

One question that often comes up regarding user-defined functions is whether it’s better to develop functions in T-SQL or in the CLR. The answer really depends on the situation and what the function will be doing.

The general rule of thumb is that if the function will be performing data access or large set-oriented operations with little or no complex procedural logic, it’s better to create that function in T-SQL to get the best performance. The reason is that T-SQL works more closely with the data and doesn’t require multiple transitions between the CLR and SQL Server engine.

On the other hand, most benchmarks have shown that the CLR performs better than T-SQL for functions that require a high level of computation or text manipulation. The CLR offers much richer APIs that provide capabilities not available in T-SQL for operations such as text manipulation, cryptography, I/O operations, data formatting, and invoking of web services. For example, T-SQL provides only rudimentary string manipulation capabilities, whereas the .NET Framework supports capabilities such as regular expressions, which are much more powerful for pattern matching and replacement than the T-SQL replace() function.

Another good candidate for CLR functions is user-defined aggregate functions. User-defined aggregate functions cannot be defined in T-SQL. To compute an aggregate value over a group in T-SQL, you would have to retrieve the values as a result set and then enumerate over the result set, using a cursor to generate the aggregate. This results in slow and complicated code. With CLR user-defined aggregate functions, you need to implement the code only for the accumulation logic. The query processor manages the iteration, and any user-defined aggregates referenced by the query are automatically accumulated and returned with the query result set. This approach can be orders of magnitude faster than using cursors, and it is comparable to using SQL Server built-in aggregate functions. For example, the following shows how you might use a user-defined aggregate function that aggregates all the authors for a specific BookId into a comma-separated list:

use bigpubs2008
go
SELECT t.Title_ID, count(*), dbo.CommaList(a.au_lname) as AuthorNames
   FROM Authors a
   JOIN titleauthor ta on a.au_id = ta.au_id
   JOIN Titles t on ta.title_id = t.title_id
GROUP BY t.title_id
having count(*) > 2
go

Title_ID AuthorNames
-------- ---------------------------------------------------------------------
TC7777   O'Leary, Gringlesby, Yokomoto

Note

The preceding example will not execute successfully because we have not created the CommaList() CLR function. It is provided merely as an example showing how such a function could be used if it was created.

In a nutshell, performance tests have generally shown that T-SQL generally performs better for standard CRUD (create, read, update, delete) operations, whereas CLR code performs better for complex math, string manipulation, and other tasks that go beyond data access.

Others