Extracting data using CAML is a pretty
efficient operation. There is, of course, the overhead of converting
the CAML query into a SQL statement and the performance implications of
the highly abstracted database schema, but all in all, the process
works well and the minor performance hit is a reasonable price to pay
for the flexibility available. There’s a problem when the data access
pattern gets a bit more complicated. For example, using SQL, we can
easily write a complex statement such as this:
Update Person.Contact
Set EmailPromotion=1
from HumanResources.Employee
Inner Join Person.Contact
on Person.Contact.ContactID=HumanResources.Employee.ContactID
Where Where HumanResources.Employee.Title like '%manager%'
This statement will execute a query and use the rows
returned by the query to select other rows in a second table that will
then be updated. However, CAML does not support updates, and as a
consequence, in order to perform a similar action on SharePoint, an
appropriate CAML query will be executed and the resulting list will be
loaded in memory. The SharePoint runtime will then iterate through the
list, making the necessary updates one row at a time.
This presents a few problems, and the first is
memory usage. If the query returns a huge amount of data, either
because the rows contain a lot of content or because there are a large
number of rows, the memory requirements will be considerable. To make
this problem worse, remember that SharePoint is a web-based platform.
Many users could be performing similar operations at the same time,
exponentially increasing the memory requirements. The second problem is
timeouts and perceived application performance; if such an operation is
performed during a page post-back, the user will need to wait for the
operation to complete before the page is refreshed. For very large
operations, the process may take so long that the page request times
out and the user subsequently refreshes the page, compounding the
performance problem.
Of course, complicated data access operations such
as this are not very common, and with some careful programming, you can
work around them. But what about simpler operations that can be
performed using the user interface? You’ve seen how nontechnical users
can create custom views of a list or library. What if a user creates a
view that returns 10,000 rows on a single page? All of that data needs
to be loaded into memory to render the page, and worse still, the
resultant page then needs to be downloaded to the browser, causing
network performance issues.
List Throttling
To prevent rogue database operations from bringing down a SharePoint server, SharePoint 2010 introduces the concept of list throttling.
List throttling allows an administrator to specify the maximum number
of items that can be retrieved in a single request. So in our example,
if a user created a view with a page size of 10,000 but the
administrator had specified a List View Threshold of 2000 items, no
rows would be returned and instead the user would receive an error
message indicating that the query exceeded the specified threshold.
Here’s how to configure list throttling:
From SharePoint Central Administration, select Manage Web Applications.
Select
the appropriate application from the list, and then in the ribbon
select Resource Throttling under General Settings, as shown next:
The Resource Throttling dialog has a number of options that warrant explanation:
List View Threshold
This value determines the maximum number of items that can be returned
by a single query. For our example, we could set this to 2000 (which is
the minimum value).
Object Model Override
In some situations, it is necessary to execute larger queries, and a
few options are available regarding how to tackle this problem. Object
Model Override allows developers to override the List View Threshold on
a per-query basis using code similar to this:
private void DoQuery(SPList list, string theCaml)
{
SPQuery query = new SPQuery();
query.Query = theCaml;
query.QueryThrottleMode = SPQueryThrottleOption.Override;
SPListItemCollection results=list.GetItems (query);
}
List View Threshold for Auditors and Administrators
This value is the upper limit for auditors and administrators. Note
that when using the Object Model Overrides the limit is not removed;
this value is used instead of the List View Threshold.
Daily Time Windows for Large Queries This
is the preferred solution for dealing with large queries.
Administrators can define a time window when usage is low and allow
large queries to execute. During this window of time, no limits are
applied.
Column Indexes
When a list or library contains a large number of
rows, executing a CAML query to return a selection of data can take
some time. To improve the performance of such operations, you can apply
indexes to columns in a list, although this approach has a few caveats.
Bearing in mind that the SharePoint database schema
mandates the storage of all user data in a single table, the indexes
that we’re creating using the SharePoint object model or user interface
are not indexes in the SQL sense of
the word. Such an index is not possible, because in the underlying
database schema, a single column may contain values for different
fields and different lists in subsequent rows. Applying an index to the
column would have unintended consequences. Without going into technical
minutiae, to get around this limitation of the database schema, when an
index is created on a column, the value being indexed is copied to a
separate table. As a consequence, if a list has 10,000 rows, for each
column being indexed, a further 10,000 rows are created in the separate
index table. For example, if 5 columns were indexed, 50,000 rows would
be added to the index table. The drawback with this is that the table
has to be kept in sync with the main data table, and as the number of
rows in the table increases, it creates an element of diminishing
returns. Here’s how to add an index to a column using the user
interface:
Open the list containing the column to be indexed. In the ribbon, select List Settings from the List Tools tab, as shown:
Select Indexed columns in the Columns section of the page, and then click Create A New Index.
Here’s how to add an index using code:
private void AddIndex(SPList list, string primaryFieldName,
string secondaryFieldName)
{
SPField primary;
SPField secondary;
if (list.Fields.ContainsField(primaryFieldName))
{
primary = list.Fields[primaryFieldName];
if (!string.IsNullOrEmpty(secondaryFieldName))
{
if (list.Fields.ContainsField(secondaryFieldName))
{
secondary = list.Fields[secondaryFieldName];
list.FieldIndexes.Add(primary, secondary);
}
}
else
{
list.FieldIndexes.Add(primary);
}
}
list.Update();
}