SQL Server 2008 : Data compression (part 2) - Data compression considerations

10/7/2013 3:32:20 AM

4. Data compression considerations

In considering the merits of data compression for a given table or index, the first and most straightforward consideration is the potential compression rate.

Compression rate

The compression rate achieved depends on the underlying data and the compression method you choose. SQL Server 2008 includes two tools for estimating disk savings: a Management Studio GUI-based wizard (shown in figure 3) and the sp_estimate_ data_compression_savings procedure. Let's look at the wizard first.

Figure 3. Using SQL Server Management Studio, you can estimate the effectiveness of both row and page compression for a particular table.

Figure 4. Use the sp_estimate_data_compression_savings procedure to estimate the disk savings for a table and optionally all its indexes using both page and row compression.

You can access the wizard by right-clicking a table and choosing Storage > Manage Compression. The wizard can be used to estimate, script, and compress the table using the selected compression technique.

The second tool, the sp_estimate_data_compression_savings procedure, as shown in figure 4, lists, for a given table and optionally all its indexes, the estimated size before and after compression. Like the Management Studio wizard, you can produce estimates for both row and page compression.

Using the estimate tools as we discussed earlier is an important step in evaluating the benefits of compression before implementing it. Once you complete the evaluation, you can implement compression using the same Management Studio wizard used for estimating the savings. Alternatively, use the ALTER TABLE statement as shown here:

-- Compress a table using 4 CPUs Only
ALTER TABLE [Sales].[SalesPerson]
REBUILD WITH (DATA_COMPRESSION = PAGE, MAXDOP=4)

One of the nice things about the ALTER TABLE method of implementing compression is its ability to accept a MAXDOP value for controlling CPU usage during the initial compression process. Depending on the size of the table and/or indexes being compressed, CPU usage may be very high for an extended length of time, so the MAXDOP setting allows some degree of control in this regard.

^[] The ALTER INDEX statement also contains the DATA_COMPRESSION option.

Finally, you should consider the tables and indexes proposed for compression. Compressing a table that represents a very small percentage of the overall database size will not yield much of a space gain. Further, if that same table is used frequently, then the performance overhead may outweigh the small gain in disk savings. In contrast, a very large table representing a significant portion of the total database size may yield a large percentage gain, and if the table is used infrequently, the gain comes with little performance overhead.

Performance overhead

As with any compression technique, space savings and increased CPU usage go hand in hand. On systems close to CPU capacity, the additional overhead may preclude data compression from being an option. For other systems, measuring the level of overhead is an important consideration.

The ideal targets for compression are tables and indexes that are used infrequently yet represent a significant percentage of the database size. Targeting such tables minimizes the performance impact while maximizing disk space savings.

Dynamic management functions and views such as sys.dm_db_index_opera-tional_stats and sys.dm_db_index_usage_stats assist in the process of identifying the least frequently used objects. For frequently used objects, the performance impact of data compression needs to be carefully measured in a volume-testing environment capable of simulating production load.

Despite the CPU overhead, certain operations such as table scans can actually receive a performance boost with data compression enabled. Let's have a look at two examples of both the positive and negative performance impacts of data compression. In viewing these examples, keep in mind that the results of any tests such as these are very much dependent on the makeup of the underlying data. These tests were conducted on modified versions of the tables in the AdventureWorks sample database. Results from real-world customer databases will obviously vary.

The first example tests the time taken to insert the contents of a modified version of the AdventureWorks SalesOrder_Detail table containing 1.6 million records into a blank table with the same structure. The insert was repeated multiple times to observe the insert time and resultant table size with both page and row compression enabled. For comparison purposes, we also ran the test against an uncompressed table.

-- Measure the size and execution time of various compression settings
TRUNCATE TABLE [Sales].[SalesOrder_Detail_Copy];
GO

ALTER TABLE [Sales].[SalesOrder_Detail_Copy]
REBUILD WITH (DATA_COMPRESSION = PAGE) -- repeat for ROW, NONE
GO

INSERT [Sales].[SalesOrder_Detail_Copy]
SELECT *
FROM [Sales].[SalesOrder_Detail];
GO

Rather than execute DBCC DROPCLEANBUFFERS between executions to clear the buffer cache, each test was run multiple times to ensure the data to insert was cached in memory for all three tests. This method lets you more accurately compare the relative performance differences between the compression methods by narrowing the focus to the time taken to write the new rows to disk.

The results of the three tests, shown in figure 5, clearly indicate higher compression rates for page compression over row compression, but at a correspondingly higher cost in terms of execution time.

Performance increase

Despite the CPU overhead required to compress and uncompress data, in certain cases compressed data can actually boost performance. This is particularly evident in disk I/O bound range scans. If the data is compressed on disk, it follows that fewer pages will need to be read from disk into memory—which translates to a performance boost. Let's use another example to demonstrate.

Figure 5. Inserting 1.6 million rows into a destination table with three different compression settings. Actual results will differ based on various factors.

In this example, we'll select the average unit price from the Sales.SalesOrder_Detail_Copy table. Again, this table was modified for the purposes of the test. For this example, the table was increased in size to 6.7 million rows. Given that the UnitPrice field isn't indexed, a full table scan will result, which is ideal for our test. We'll run this three times, on an uncompressed table, and with both forms of compression enabled. For this test, we'll clear the buffer cache with DBCC DROPCLEANBUFFERS before each test to ensure the query reads from disk each time. The script used for this test looks like this:

-- Measure the table scan time of various compression settings
ALTER TABLE [Sales].[SalesOrder_Detail_Copy]
REBUILD WITH (DATA_COMPRESSION = ROW) -- repeat for PAGE, NONE
GO

DBCC DROPCLEANBUFFERS;
GO

SELECT AVG(UnitPrice)
FROM Sales.SalesOrder_Detail_Copy;GO

The results of the three tests, shown in figure 6, clearly indicate that page compression enables the fastest execution time for this particular example—almost three times quicker than the query against the uncompressed table.

Figure 6. Table scan execution time using three different compression levels. Actual results will differ based on various factors.

Others

- SQL Server 2008 : Data compression (part 1) - Row compression, Page compression

- SQL Server 2008 : BLOB storage with FileStream (part 2) - FileStream data

- SQL Server 2008 : BLOB storage with FileStream (part 1) - BLOBS in the database, BLOBS in the file system

- Windows Small Business Server 2011 : Managing Software Updates - Using SBS Software Updates (part 2) - Deploying Updates

- Windows Small Business Server 2011 : Managing Software Updates - Using SBS Software Updates (part 1) - Configuring Software Update Settings

- Windows Small Business Server 2011 : Managing Software Updates - The Patching Cycle

- Windows Phone 7 : Creating a Game Framework - Benchmarking and Performance (part 2)

- Windows Phone 7 : Creating a Game Framework - Benchmarking and Performance (part 1)

- Operating and Monitoring Exchange Server 2013 : Monitoring Enhancements in Exchange 2013

- Operating and Monitoring Exchange Server 2013 : Reporting