SharePoint 2010 has a number of
performance and redundancy features. The search capabilities have been
redesigned to allow for a broader ability to scale and more points for
redundancy.
The new architecture for SharePoint 2010
provides a more compartmentalized approach to search by dividing the
tasks that the search mechanism performs into different roles that can
also be spread out across physical or virtual servers, as well as
further divisions within these roles. The four server roles for search
are as follows:
- Web server role
- Query server role
- Crawl server role
- Database server role
The query server and crawl server roles are
unique to the search component, whereas the web server and database
server roles can be utilized by and are necessary for other components
of SharePoint 2010.
1. Web Server Role
Servers hosting the web server role host the
web components of SharePoint 2010 that provide the user interface for
searching. These components, such as search center sites, Web Parts,
and web pages that host query boxes and result pages, are delivered
from servers with the web server role to the end users. These
components send requests to servers hosting the query server role and
receive and display the result set.
The web server role may not be necessary in
SharePoint farms that are dedicated for search, as other farms that are
utilizing the search farm will handle this role and communicate with
the search farm directly from their web servers. The web server role is
often combined in smaller deployments with web servers serving content
or with other search server roles.
2. Query Server Role
The query server role serves results to web
servers. Query servers receive requests from servers with the web
server role and forward these requests to all servers in a farm with
the query server role. They then process the query against all index
partitions and return their results to the requesting server, which
then forwards the results to the requesting web server.
On each query server, there
is a query processor, which trims the result set for security, detects
duplicates, and assigns the appropriate associated properties to each
result from the property store. Any SharePoint farm providing search
must have at least one server hosting the query server role. However, a
farm may call search from another farm and therefore not need the query
server role.
The query server role, like other application
roles in SharePoint, can be hosted on a server with other application
server roles. This makes SharePoint 2010 very versatile but may cause
confusion when planning resource usage. Having all servers provide all
roles is not optimal resource usage, as some demanding roles may cause
other roles to perform poorly. Caution and consideration regarding the
role and demand of each server and each task are therefore advised.
The query server holds the index on its file
structure or a file structure relative to it. A query server can host
either the entire index or index partitions—sections of the index that
can be assigned to different query servers by the administrator for
load, performance, and redundancy. Index partitions may be duplicated
on a number of servers with the query server role to provide
redundancy. Adding query servers with the index partitioned across
those query servers will also increase search query performance and
reduce result latency.
Index Partitions
An index partition is a portion of the entire
search index. Microsoft has designed the index to be broken into
logical sections that can be distributed and mirrored across query
servers. Generally, index partitions are spread across servers and
represent an equal amount of crawled data. Indexes may also be
partitioned on a single server. They can also be mirrored on another
server or set of servers to provide redundancy.
Imagine, for example, that a SharePoint farm
has 300GB of crawled data and three query servers. Each query server
can hold a single index partition representing 100GB of crawled data.
Query speed is increased because the load of searching the index is
distributed over servers and divided by three. The query servers take
time to look into the index for any given query, and therefore
searching in smaller partitions across multiple servers is
substantially more performant. An additional mirror of each partition
can also be added to each query server to insure redundancy. Should any
one query server fail, the remaining query servers still have all
portions of the index and can continue to serve results. See Figure 1.
Figure 1. Three query servers with mirrored partitions
Crawl Server Role
The crawl server role is responsible for
crawling content. This crawling mechanism is similar to other web
crawling technologies, except that it is specifically designed to crawl
and index SharePoint content, including user profiles from a directory,
associated document metadata, custom properties, file shares, Exchange
public folders, web content, and database and custom content through
the BSC (as well as content via iFilters and protocol handlers).
The crawl servers host the crawler components,
and, like the query server role, at least one server in a SharePoint
2010 farm providing search must host the crawl server role. Crawlers on
the crawl servers are associated with crawl databases. Each crawler is
associated with one crawl database.
It is recommended that the Search
Administration component also be hosted on the server with the crawl
server role. However, it can be hosted on any server in the farm.
SharePoint 2010 hosts only a single Search Administration component per
Search service application.
Note
Until sometime in the middle of 2010, the crawl server in SharePoint
2010 was known as the index server. In November 2010, Microsoft updated
SharePoint 2010 documentation, changing the name to crawl server.
Search Service Application (SSA)
SharePoint 2010 has its core services broken
into service applications. These applications, which deliver much of
the functionality of SharePoint 2010, are separated to provide
granularity and scalability when managing many of the different
features available in SharePoint 2010. These services include but are
not limited to the User Profile service, the Business Data Connectivity
service, the Managed Metadata service, and the Search service, among
others. Additionally, third-party vendors or solution providers could
provide custom service applications that plug into SharePoint 2010,
although at the time of writing, there were not any good examples of a
third-party service application.
The Search service application is the service
application that is responsible for the search engine. It manages the
crawler and the indexes as well as any modifications to topology or
search functionality at the index level.
3. Database Server Role
In a SharePoint 2010 Search deployment, the
search databases are hosted on a server with the database server role.
It is also possible to host other SharePoint 2010 databases on the same
server or separate search and content database roles. Servers with the
database server role can be mirrored or clustered to provide redundancy.
There are three types of databases utilized by
a SharePoint 2010 farm providing search: property databases, crawl
databases, and Search Administration databases.
Aside from disk size and
performance limitations, there are no other considerations that limit
hosting other databases, such as SharePoint content databases, on a
SharePoint 2010 server with the database server role.
- Property databases: Property databases hold property
metadata for crawled items. These properties can be crawled document
metadata or associated custom properties from SharePoint 2010.
- Crawl databases: Crawl databases store a history of the
crawl. They also manage the crawl operations by indicating start and
stop points. A single crawl database can have one or more crawlers
associated with it. However, a single crawler can be associated with
only one crawl database.
- Search Administration databases: Search
Administration databases store search configuration data such as scopes
and refiners and security information for the crawled content. Only one
Search Administration database is permitted per Search service
application.