SharePoint 2013 search has been
re-architected, and the goal of achieving a single enterprise search
platform has introduced a number of changes. You can consider SharePoint
2013 search to be a combination of SharePoint 2010 search, FAST Search
for SharePoint 2010, and FAST technology.
Microsoft acquired FAST search in mid-2008, and
introduced the FAST Search Server for SharePoint 2010 alongside
SharePoint Server 2010 when the 2010 products were released. The
original FAST ESP product was also available. For SharePoint 2013, the
goal was to integrate the best of the current products, along with new
components not yet introduced, into a single enterprise search
architecture. The result is a search platform that combines the crawler
and connector framework from SharePoint Search, updated content
processing and query processing from FAST technology, and a search core
based on FAST Search. This architecture also includes the new analytics
engine, which is used for ranking and recommendations. The FAST Search
product, and the brand name FAST are gone, and SharePoint 2013 search is
the current incarnation. This single architecture is most obvious
during the install process, where you will notice the single Search
Service application.
NOTE In
addition to eliminating the separate FAST search products, including
FAST for SharePoint, FAST ESP, and FAST Search for Internet Sites, the
standalone Microsoft Search Server product does not have a 2013 version.
The search topology has several key improvements:
- Separate crawl and indexing processes.
- A new analytics process that provides search and usage analyses, including link analysis and recommendations.
- The entire index is stored locally on disk, and it no longer uses the property database.
- Search is scalable in two dimensions, content and query load.
- The administration component can be made fault tolerant.
- Native support for repartitioning the index as part of scaling out the topology.
Topology
The topology can be broken down into
search components and databases that work together to provide search
capability, as shown in Figure 1.
In a multi-server farm, these components reside on application servers,
and the databases exist on SQL Server database servers. When designing
the search topology to support your requirements, you should take into
account whether you are providing search for a public website or an
internal intranet. Additionally, you should consider high availability
and fault tolerance requirements, the amount of content, and the
estimated page views and queries per second. The search components can
be categorized into five groups or processes:
- Crawl and content — Includes the crawl and content processing components and the crawl database
- Analytics — Includes the analytics processing component, and the links and analytics reporting databases
- Index — Includes the index component, index partition, and index replica
- Query — Includes the query processing component
- Administration — Includes the administration component and the administration database
You must define and scale the topology to
accommodate your requirements. SharePoint 2013 uses Central
Administration to show the current status of the search topology. Unlike
SharePoint 2010, which used Central Administration to change and scale
the search topology, the SharePoint 2013 search topology is created and
managed using PowerShell. SharePoint 2013 has a more complex and
flexible search topology, which you can manage more efficiently using
PowerShell. You’ll learn how to do this in the section, “Configuring
Enterprise Search.” The following sections take a detailed look at these
five main search components.
Managing the Crawl Process and Crawled Properties
Search effectiveness requires that the
necessary content be indexed and accessible to end users. The whole
process begins with the crawl component, which is also referred to as
the crawler. This component crawls content sources,
and it delivers the crawled content and associated metadata to the
content processing component. The crawl process interacts and retrieves
data using connectors and protocol handlers. SharePoint 2013 includes
more out-of-the-box connectors than SharePoint 2010, as well as Business
Connectivity Services (BCS) extensibility.
Content sources that you create in the Search
service application specify the repositories to be crawled. The content
source represents a group of crawl settings, which includes the host to
crawl, the type of content, the crawl schedule, and how deep to crawl.
By default, you have the Local SharePoint Sites content source upon
installation, but you can create new sources, similar to how you did
with SharePoint 2010.
To manage crawl volume and performance, you can
simultaneously crawl content using multiple crawl components. As the
crawler processes data, it caches the content locally in preparation for
sending content to the content processing component. The crawl
component also uses one or more crawl databases to temporarily store
information about crawled items and to track crawl history. There is no
longer a one-to-one mapping of the crawl database to crawler as in
SharePoint 2010; each crawl database can be associated with one or more
crawlers, so you can scale them independently.
To support the need for a “fresher” index,
SharePoint 2013 includes a new crawl type, the continuous crawl. The
continuous crawl is applicable only to SharePoint content sources, and
is a new option you can choose when you create a new content source. You
can think of the continuous crawl as being similar to the incremental
crawl but without the need to be scheduled. With continuous crawl,
changed content is crawled every 15 minutes by default, but the
frequency is configurable. If a full crawl has started, the new system
allows the latest changes to appear in results before the full crawl
completes. As in SharePoint 2010, all crawler configurations are stored
in the administration database.
The content and metadata that has been crawled and extracted from a document or URL are represented as crawled properties.
They are grouped into categories based on the iFilter or protocol
handler used to retrieve the property. Examples of properties are Author
and Title. New crawled properties are created after each new crawl, as
new content is added to the enterprise. Crawled properties are passed to
the content-processing component for further analysis.
Content Processing
This is a very specialized node in the
search architecture, whose purpose is to analyze and process the data
and metadata that will be included in the index. The processing node
transforms the crawled items and crawled properties using language
detection, document parsing, dictionaries, property mapping, and entity
extraction. This component is also responsible for mapping crawled
properties to managed properties.
Content processing also extracts the links and
anchor text from web pages and documents, because this type of
information helps influence the relevancy of the document. This raw data
is stored in the link database. When a user performs a search and
clicks a result, the click-through information is also stored
unprocessed in the link database. All this raw data is subsequently
analyzed by the analytics processing component, which updates the index
with relevancy information.
Once completed, the transformed data is then sent
to the index component. Content processing configurations are stored in
the search administration database. This includes new crawled
properties, so administrators can manually create a mapping of crawled
properties to managed properties. The content-processing component is
also highly extensible, by using web services that would provide
information about how content should be processed.
Managed Properties
Crawled properties are mapped to managed
properties to include the content and metadata in the search index.
Only managed properties are included in the index; therefore, users can
search only on managed properties. Managed properties have attributes,
which determine how the contents are shown in search results. For an
extensive list of the default SharePoint 2013 managed properties, and
the associated mapped crawled properties.
As the Managed Properties Overview table displayed in the reference
indicates, managed properties also have associated attributes, also
referred to as properties; yes, the managed properties have properties.
The list of default managed properties, also referred to as the search schema or index schema,
contains the managed properties, their associated properties, and the
mapping between crawled properties and managed properties. You can edit
the search schema yourself, by manually mapping crawled properties to
managed properties, and configuring property settings. The
content-processing component utilizes this schema to perform any
necessary mapping.
A single managed property can be mapped to more
than one crawled property, as shown in the preceding reference. In this
case, the order in which the crawled properties are mapped determines
the content of the managed property that is stored in the index when a
crawled document contains values for more than one of the mapped crawled
properties. You can also map a single crawled property to multiple
managed properties.
Site columns in SharePoint libraries automatically
generate a new managed property after crawling, and a mapping between
the new crawled property and the new managed property. By editing the
search schema, you can change the default mapping, create new mappings,
or create new managed properties. A full crawl must be completed after
the creation of a new managed property to ensure that its value is
included in the search index.
Search Schema
The search schema is stored in the
search administration database, and the schema web page, which is called
Search Service Application: Managed Properties, is shown in Figure 2.
This page is available from the Search Service Application: Search
Administration page in Central Administration, using the Search Schema
link in the Queries and Results section. Note that this page is similar
to the page referenced in the preceding section, except that this page
reflects the current state rather than the default state. This is the
page you use when making changes to the schema.
- It contains the mapping between crawled properties and managed
properties, including the order of mapping for those cases that have
mapped multiple crawled properties.
- It maintains the settings for which index stores the managed property.
- It contains the settings or properties for each of the different managed properties.
- Site collection administrators can change the search schema for a
particular site collection using the Site Settings page, and customize
the search experience for that specific site collection. This is a new
capability in SharePoint 2013; SharePoint 2010 only allowed schema
changes in Central Administration. Site owners can view the schema, but
they are not allowed to make changes.
- It is possible to have multiple search schemas.
NOTE You can create more than one search schema by creating additional Search Service applications, each of which has its own.