Sharepoint 2010 : Planning Your Search Deployment - Performance (part 2) - Scaling, Availability

12/26/2013 1:31:58 AM

2. Acting on Performance Issues

Acting on performance issues is as important as identifying them. There are a few key areas where performance can be improved based on data from the health reports.

High query latency

Separate query server role, web server role, and crawl server role depending on where the bottleneck is or if another service is taking up too many resources.
Scale up by adding index partitions and if necessary RAM to handle the 33% of the index partitions being stored in memory.
Add query servers, and place additional index partitions on the new query servers to balance the query load.

Query latency growing over time

Add index partitions to spread load and check the performance trend.
Separate roles onto separate servers.
Add servers to handle query load as demand grows.

3. Scaling

Scaling is the process of adding resources to IT systems to improve performance or handle additional load. Scaling is an important topic for mitigating poor performance and eventual end-user dissatisfaction. Scaling can prevent downtime and allow for reactive management of resources as well as control budget overruns by allowing for progressive addition of infrastructure as demand and corpus size grow.

There are two basic methods for scaling search:

Scaling up: Adding more resources to a physical machine that is under heavy load
Scaling out: Adding additional machines or segmenting tasks to better utilize existing resources

SharePoint 2010 is especially good at managing existing resources and scaling out to handle demand and alleviate bottlenecks.

Why and When to Scale

The most critical point to consider scaling is when users begin to complain about missing content and slow response times from search. Missing content can indicate a problem with crawling that may be the result of a lack of resources (either in the crawler or in the target system). Slow response times usually indicate too heavy of a load on the query servers.

Basic scaling can be as simple as adding servers to effectively move from one level of deployment as content grows (as was outlined in the previous section) or a more reactive approach of addressing specific performance or capacity issues as they arise. In order to implement the latter technique, one should be aware of the triggers for identifying an issue and the steps to address it.

The following are some typical scaling trigger scenarios and their resolutions:

Scenario 1—Missing or stale content: Content is missing from the repository, and the crawler rate is less than 15 documents per second from the Crawl Rate per Content Source health report. This health report can be found in Central Administration => Administrative Report Library => Search Administration Reports => Crawl Rate per Content Source.
Scenario 2—Slow search responses: Users complain of slow response times on search. Either SharePoint 2010 is performing well on basic page loads and search is not returning fast enough, or the entire search and result pages load slowly. First check that there aren't any poorly performing custom design elements or Web Parts that may be experiencing problems on the search result page. Check the Search health reports to identify slow query response times and latency, and consider scaling out the query servers by either partitioning the index and/or adding query servers to hold index partitions.
Scenario 3—No disk space on database server: The good SharePoint admin always keeps his or her eye on the Central Administration and never ignores warning messages from SharePoint 2010. SharePoint 2010 is especially good at notifying the administrator of any issues. This includes database size issues. Should the database server run out of space, SharePoint 2010 will alert the administrator. This scenario most often happens during crawling when the database server has been set to automatically grow. The natural fix for database disk space issues is to add disks to the array or increase the size of disks on a server if the solution is not using RAID or some other redundant storage solution.

Disk Recommendations

Search is an input/output–intense process. Basically, a search engine indexes all the content from a particular source (e.g., SharePoint site, web site, database, file share, etc.) and builds databases with the words indexed, the links to the documents, metadata found on the documents, metadata associated with the documents, time, size, type, etc. The crawling process writes all this information to databases. SharePoint stores some of these databases in SQL and some on the file structure in index partitions. The query components then access these databases, look for the terms that were searched for (these may be free text or property-based), and match the terms with the documents and their properties. To do this requires a lot of writing and reading to hard drives. Therefore having hardware that performs well is an important aspect of improving search performance.

Generally speaking, to support search well, databases will need to be in some kind of disk array. For write-intensive databases, such as a crawl database, RAID 10 is recommended. For added performance, the temp database should be separated to a RAID 10 array. (See more in the following sections, and consult external resources for more information on RAID.) In addition, there should be a redundant array to avoid downtime. More redundant arrays will slow performance but improve redundancy, so finding the right balance for the requirements is key.

RAID stands for redundant array of independent disks and is the technology of combining several disks into an array, where if one fails, the other will take the load and provide a mirror of all data. More complicated RAID configurations have more redundancy. They also improve performance.

Search queries require a lot of throughput, so having well-performing databases can help. On the database servers, it is recommended that the input/output per second (IOPS) capabilities be 3,500 to 7,000 IOPS and at least 2,000 on the property database. Therefore, at minimum, disk speeds of 7,200 RPM or better are required in a RAID configuration. Highly performant search implementations also make use of storage area networks and sometimes solid state drives (which may support up to 1,000,000 IOPS in a storage array vs. 90 IOPS for a single 7,200 RPM SATA drive).

4. Availability

Availability in IT systems is the measure of how often servers and their services are available for users to access. Of course, all organizations would like to have 100% availability. This level of availability is either not possible, not practical, or too expensive for most organizations. Therefore, most organizations endeavor to achieve the highest availability possible for a reasonable cost. Generally, availability is measured by a number of nines. This is a reference to the closeness to 100% availability achievable by increments of additional nines from 90% to 99.999…%. A common expression is “5 nines of uptime” or 99.999%. This represents 5 minutes and 35 seconds of allowable downtime per year.

The main method to achieve high availability in IT systems is by making them redundant—that is, adding copies (or mirrors) of the systems that will take over the tasks should hardware or software fail (which it inevitably does). Redundancy in SharePoint deployments can insure uninterrupted service for end users. Redundancy in a SharePoint deployment can be deployed as a solution for failover scenarios (a server dies) but can also be deployed to ease maintenance tasks and allow for performance and server management without adversely affecting the uptime of the solution.

SharePoint 2010 handles redundancy in a number of ways by allowing for mirrored services over multiple machines. The farm architecture of SharePoint also allows for one server to compensate for another if one of the servers in the farm fails or is taken down for maintenance.

Each component in SharePoint 2010 Search represents a possible failure point. Therefore, the components are made capable of being mirrored. The most common areas to mirror are as follows:

Web servers: Although not strictly part of the search engine, web servers deliver the content that SharePoint 2010 Search is indexing. If these servers fail and there is no redundancy, then the index will not refresh. However, if the web servers fail, there will be no site, so users will probably not be primarily concerned with the lack of search.
Query servers: Multiple query servers can improve performance but also provide for redundancy by providing an alternative source for the queries to be processed. Mirroring index partitions across query servers insures that all portions of the index are available if one server should fail.
Crawl servers: If constant index freshness is a critical business need, mirroring crawl servers can help the database and indexes stay up to date should one fail.
Database servers: One of the most critical areas to provide redundancy in is the database servers. Database servers provide the data for the query components and the repository for the indexing components to write to. Database failure will cause search to fail completely, so it is essential to have databases constantly available.

Why and When to Consider Making Services Redundant

Any service that is considered critical should have a failover component. Being able to search and receive results is a critical feature of SharePoint. Therefore, the services that allow users to query the index and receive results should have redundancy. This includes query servers and any index partitions on those query servers.

All servers with the query role should have some level of redundancy. Ideally, all index partitions should have a redundant instance. So, if there is one server with the query role and two index partitions on that server, a redundant query server with a second instance of both index partitions will be necessary.

This is considered minimum redundancy for SharePoint Search as it insures that given a hardware failure for a query server, search will not fail with an error for users.

Additionally, database servers should have a redundancy element as they hold the crawl and property databases essential to searching as well as the Administration database, without which search cannot be performed. Most organizations already have redundant database implementations, such as database clusters. Either the search databases should be installed in these existing database clusters, or a similar model should be employed to insure redundancy.

If freshness of content is considered critical, making crawl servers redundant is required. If freshness of content is not a critical factor, having a less strict redundancy model for indexing can be acceptable. However, should a failure happen, the crawl components must be re-provisioned on new or existing servers.

Server Downtime Impact Chart

Although no one likes downtime, avoiding any downtime with full redundancy may be either too costly, too cumbersome, or too maintenance-intensive for some organizations. Therefore, it is valuable to understand the impact of downtime for the individual components of SharePoint 2010 Search and determine which may be acceptable to the organization. See Table 1.

Note A dedicated search farm need not have the web server role if this role is provided by a content farm, which connects to the search farm.

Table 1. Server Downtime Impact Chart

Server Role	Severity	Impact of Downtime
Web server role	Critical	Users cannot see the search page (and possibly not SharePoint content if the web server role is shared).
Query server role	High	Users will not be able to search for content. The search page may still work but return an error.
Crawl server role	Medium	The search will still work and results will be returned, but the content will not be up to date and indexes not refreshed.
Database server role	Critical	Users will not be able to search for content. The search page may still work but will return an error (SharePoint content may also not be returned if the database server role is shared).

Others

- Sharepoint 2010 : Planning Your Search Deployment - Performance (part 1) - Performance Reports

- Windows Server 2008 : Starting and Using PowerShell - Using Comparison Operators, Understanding Pipelining

- Windows Server 2008 : Starting and Using PowerShell - Understanding PowerShell Variables

- Windows Server 2008 : Starting and Using PowerShell - Redirecting Output with Windows PowerShell, Understanding PowerShell Errors

- Windows Server 2008 : Starting and Using PowerShell - Exploring get-member

- Windows Server 2008 : Starting and Using PowerShell - Creating Aliases, Discovering Windows PowerShell Commands

- Exchange Server 2010 : Managing Mailbox Databases (part 2) - Properties of a Mailbox Database

- Exchange Server 2010 : Managing Mailbox Databases (part 1) - Viewing Mailbox Databases, Creating Mailbox Databases

- Exchange Server 2010 : Mailbox Storage - Determining the Number of Databases, Allocating Disk Drives

- Exchange Server 2010 : Getting to Know Exchange Database Storage (part 2)