Prior to 2008, SQL Server-based
applications used one of two methods for storing binary large objects
(BLOBs) such as video, images, or documents (PDFs, docs, and so forth).
The first method was to store the object within the database in an image or varbinary(max)
column. Alternatively, BLOBs were stored in file system files, with a
link to the file (hyperlink/path) stored in a table column.
Both of these methods have their pros and cons. SQL Server 2008 introduces a third method known as FileStream. This method lets you combine the benefits of both of the previous methods while avoiding their drawbacks.
Before we continue, keep in mind that character-based BLOBs are often referred to as CLOBs, or character large objects. In some texts, BLOBs and CLOBs are referred to collectively as LOBs, or large objects. For the purposes of this section, we'll use the term BLOBs to refer to either binary large objects or character large objects.
Before
we cover the new FileStream option, let's briefly cover the details of
the previous methods of BLOB storage, both of which are still supported
in SQL Server 2008.
1. BLOBS in the database
SQL
Server's storage engine is designed and optimized for storage of normal
relational data such as integer and character-based data. A fundamental
design component of the SQL Server engine is the 8K page size, which
limits the maximum size of each record. All but the smallest BLOBs
exceed this size, so SQL Server can't store them in row like the rest of the record's data.
To
get around the 8K limitation, SQL Server breaks the BLOB up into 8K
chunks and stores them in a B-tree structure, as shown in figure 1, with a pointer to the root of the tree stored in the record's BLOB column.
Prior
to SQL Server 2005, the primary data type for in-database BLOB storage
was the image data type. SQL Server 2005 introduced the varbinary(max)
data type to overcome some of the image limitations, discussed next.
Image and text data types
The
primary data type used for binary-based BLOB storage prior to SQL
Server 2005 is the image data type, and the text data type supports
character-based BLOBs (CLOBs). Both data types provide support for
BLOBs up to 2GB.
Still supported in SQL Server 2008, these data types have a number of
drawbacks that limit their usefulness, chiefly the inability to declare
image or text variables in T-SQL batches. As such, accessing and
importing BLOB data required a combination of programming techniques,
reducing the appeal of in-database BLOB storage somewhat.
Varbinary(max) and varchar(max)
Introduced in SQL Server 2005, the varbinary(max) data type, and its text equivalents varchar(max) and nvarchar(max),
overcome the limitations of the image and text data types by providing
support for variable declaration and a range of other operations.
Such
support makes BLOB access and importing much simpler than the
equivalent process in SQL Server 2000 with image and text data types.
Here's an example:
-- Insert a jpg file into a table using OPENROWSET
INSERT INTO clients (ID, DOB, Photo)
SELECT 1, '21 Jan 1974', BulkColumn
FROM OPENROWSET (Bulk 'F:\photos\client_1.jpg', SINGLE_BLOB) AS blob
As
a BLOB storage strategy, in-database storage allows BLOBS to be tightly
coupled with the related data. The BLOBS are transactionally
consistent—that is, updates on the BLOB are rolled forward or back in
line with the rest of the record, and included in backup and restore
operations. All good so far. The downside, however, is significant. For
databases with large numbers of BLOBs, or even moderate amounts of very
large BLOBs, the database size can become massive and difficult to
manage. In turn, performance can suffer.
In
addressing these concerns, a common design is to store BLOBs in the
file system with an appropriate reference or hyperlink stored in the
column.
2. BLOBS in the file system
The
alternative to storing BLOBs in the database is to store them in their
native format as normal files in the file system. Windows NTFS is much
better at file storage than SQL Server, so it makes sense to store them
there and include a simple link in the database. Further, this approach
lets you store BLOBs on lower-cost storage, driving down overall costs.
An example of this approach is shown in figure 2. Here, the table contains a photolink column storing the path to a file system-based file.
The
problem with this approach is twofold; the data in the database is no
longer transactionally consistent with the BLOB files, and database
backups aren't guaranteed to be synchronized with the BLOBs (unless the
database is shut down for the period of the backup, which isn't an
option for any 24/7 system).
So
on one hand we have transactional consistency and strongly coupled data
at the expense of increased database size and possible performance
impacts. On the other hand, we have storage simplicity and good
performance at the expense of transactional consistency and backup
synchronization issues. Clearly, both options have significant
advantages and disadvantages; DBAs and developers often passionately
argue in favor of one option over another. Enter FileStream, King of
the BLOBs...