4. Implementing Page Compression
Page compression can be implemented for a table at the time it is created or by using the ALTER TABLE command, as in the following example:
ALTER TABLE sales_big REBUILD WITH (DATA_COMPRESSION=PAGE)
Unlike row compression, which is applied immediately
on the rows, page compression isn’t applied until the page is full. The
rows cannot be compressed until SQL Server can determine what encodings
for prefix and dictionary substitution are going to be used to replace
the actual data. When you enable page compression for a table or a
partition, SQL Server examines every full page to determine the possible
space savings. Any pages that are not full are not considered for
compression. During the compression analysis, the prefix and dictionary
values are created, and the column values are modified to reflect the
prefix and dictionary values. Then row compression is applied. If the
new compressed page can hold at least five additional rows, or 25% more
rows than the page currently holds, the page is compressed. If neither one of these criteria is met, the compressed version of the page is discarded.
New rows inserted into a compressed page are
compressed as they are inserted. However, new entries are not added to
the prefix list or dictionary based on a single new row. The prefix
values and dictionary symbols are rebuilt only on an all-or-nothing
basis. After the page is changed a sufficient number of times, SQL
Server evaluates whether to rebuild the CI record. The PageModCount
field in the CI record is used to keep track of the number of changes
to the page since the CI record was last built or rebuilt. This value is
updated every time a row is updated, deleted, or inserted. If SQL
Server encounters a full page during a data modification and the PageModCount is greater than 25 or the PageModCount
divided by the number of rows on the page is greater than 25%, SQL
Server reapplies the compression analysis on the page. Again, only if
recompressing the page creates room for five additional rows, or 25%
more rows than the page currently holds, the new compressed page
replaces the existing page.
In B-tree structures (nonclustered indexes or a
clustered table), only the leaf-level and data pages are considered for
compression. When you insert a new row into a leaf or data page, if the
compressed row fits, it is inserted and nothing more is done. If it
doesn’t fit, SQL Server attempts to recompress the page and then
recompress the row based on the new CI record. If the row fits after
recompression, it is inserted and nothing more is done. If the row still
doesn’t fit, the page needs to be split. When a compressed page is
split, the CI record is copied to the new page exactly as it was, along
with the rows moved to the new page. However, the PageModCount
value is set to 25, so that when the new page gets full, it will be
immediately analyzed for recompression. Leaf and data pages are also
checked for recompression whenever you run an index rebuild or shrink
operation.
If you enable compression on a heap table, pages are
evaluated for compression only during rebuild and shrink operations.
Also, if you drop a clustered index on a table, turning it into a heap,
SQL Server runs compression analysis on any full pages. Compression is
avoided during normal data modification operations on a heap to avoid
changes to the Row IDs, which are used as the row locators for any
indexes on the heap. Although the RowModCounter is still maintained, SQL Server essentially ignores it and never tries to recompress a page based on the RowModCounter value.
5. Evaluating Page Compression
Before choosing to implement page compression, you
should determine if the overhead of page compression will provide
sufficient benefit in space savings. To determine how changing the
compression state will affect a table or an index, you can use the SQL
Server 2008 sp_estimate_data_compression_savings stored
procedure, which is available only in the editions of SQL Server that
support data compression. This stored procedure evaluates the effects of
compression by sampling up to 5,000 pages in the table and creating a
copy of these 5,000 pages of the table in tempdb, performing
the compression, and then using the sample to estimate the overall size
for the table after compression. The syntax for sp_estimate_data_compression_savings is as follows:
sp_estimate_data_compression_savings
[ @schema_name = ] 'schema_name'
, [ @object_name = ] 'object_name'
, [@index_id = ] index_id
, [@partition_number = ] partition_number
, [@data_compression = ] 'data_compression'
You can estimate the data compression savings for a table for either row or page compression by specifying either 'ROW' or 'PAGE' as the value for the @data_compression parameter. You can also estimate the average size of the compressed table if compression is disabled by specifying NONE as the value for @data_compression. You can also use the sp_estimate_data_compression_savings
procedure to estimate the space savings for compression on a specific
index or partition. The following example estimates the space savings if
page compression were applied to the big_sales table in the bigpubs2008 table versus row compression:
use bigpubs2008
go
exec sp_estimate_data_compression_savings 'dbo', 'sales_big', null, null, 'PAGE'
go
object_name schema_name index_id partition_number
size_with_current_compression_setting(KB)
size_with_requested_compression_setting(KB)
sample_size_with_current_compression_setting(KB)
sample_size_with_requested_compression_setting(KB)
------------ ------------ --------- ----------------
-----------------------------------------
------------------------------------------
------------------------------------------------
------------------------------------------------
sales_big dbo 1 1
116512
39128
40016
13440
sales_big dbo 2 1
36648
22128
10904
6584
exec sp_estimate_data_compression_savings 'dbo', 'sales_big', null, null, 'ROW'
go
object_name schema_name index_id partition_number
size_with_current_compression_setting(KB)
size_with_requested_compression_setting(KB)
sample_size_with_current_compression_setting(KB)
sample_size_with_requested_compression_setting(KB)
------------ ------------ --------- ----------------
-----------------------------------------
------------------------------------------
------------------------------------------------
------------------------------------------------
sales_big dbo 1 1
116512
97936
40344
33912
sales_big dbo 2 1
36648
27176
10992
8152
You can see in this example that the space savings
from page compression would be significant, with an estimated reduction
in the size of the table itself (index_id = 1) from 113MB
(116,512 KB) to 38MB (39,128 KB), a savings of more than 66%. Row
compression would not provide nearly as significant a savings, with an
estimated reduction in size from 113MB to only 95MB (97,936 KB), only a
16% savings.
If you compress the table, you can compare the
estimated space savings to the actual size. For example, let’s look at
the initial size of the sales_big table:
use bigpubs2008
go
select sum(page_count) as pages, sum(compressed_page_count) as compressed_pages
from sys.dm_db_index_physical_stats (DB_ID(),
OBJECT_ID('sales_big'), 1, null, 'DETAILED')
where index_level = 0
SELECT SUM(used_page_count/ 128.0) AS size_in_MB
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('dbo.sales_big') AND index_id=1
GO
pages compressed_pages
------------------------------------------
14519 0
size_in_MB
----------------------------------------
113.742187
Now, implement page compression on the sales_big table:
ALTER TABLE sales_big REBUILD WITH (DATA_COMPRESSION=PAGE)
Now, re-examine the size of the sales_big table:
select sum(page_count) as pages, sum(compressed_page_count) as compressed_pages
from sys.dm_db_index_physical_stats (DB_ID(),
OBJECT_ID('sales_big'), 1, null, 'DETAILED')
where index_level = 0
SELECT SUM(used_page_count/ 128.0) AS size_in_MB
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('dbo.sales_big') AND index_id=1
GO
pages compressed_pages
----------------------------------------
4452 4451
size_in_MB
----------------------------------------
34.906250
In this example, you can see that the table was
reduced in size significantly, from 14,519 pages to 4,452 pages (113.7MB
to 34.9MB), pretty much right in line with the estimated space savings.
You can also see that compression was reasonably effective, compressing
4,451 of 4,452 pages.
Be aware that you may not always receive the space
savings predicted due to the effects of fill factor and the actual size
of the rows. For example, if you have a row that is 8,000 bytes long and
compression reduces its size by 40%, only one row can still be fit on
the data page, so there is no space savings for that page. If the
results of running sp_estimate_data_compression_savings
indicate that the table will grow, this indicates that many of the rows
in the table are using nearly the full precision of the data types, and
the addition of the small overhead needed for the compressed format is
more than the savings from compression. In this, it is obvious that
there is no advantage to enabling compression.
6. Managing Data Compression with SSMS
The preceding examples show the T-SQL commands you
can use to evaluate and manage row and page compression in SQL Server
2008. SSMS provides a Data Compression Wizard for evaluating and
performing data compression activities. To invoke the Data Compression
Wizard, right-click on the table in the Object Explorer and select
Storage and then select Manage Compression. Click Next to move past the
Welcome page to bring up the Select Compression Type page, as shown in Figure 5.
On the Compression Type Page, you can choose the
compression type to use at the partition level or to use the same
compression type for all partitions. You can also see the estimated
savings for selected compression type by clicking on the Calculate
button. After you click on Calculate, the wizard displays the current
partition size and requested compression size in the corresponding
columns (note that it might take a few moments to do the calculation).
After making your selections, click on Next to
display the Select and Output Option page. Here, you have the
opportunity to have the wizard generate a script of commands you can run
manually to implement the selected compression type. If you choose to
generate a script, you have the option to save the script to a file, the
Clipboard, or to a new query window in SSMS. You also have the option
to run the compression changes immediately or schedule a SQL Agent job
to run the changes at a specified time.