Sharepoint 2013 : Configuring and Managing Enterprise Search - SEARCH ARCHITECTURE (part 1)

8/9/2013 9:09:47 AM

SharePoint 2013 search has been re-architected, and the goal of achieving a single enterprise search platform has introduced a number of changes. You can consider SharePoint 2013 search to be a combination of SharePoint 2010 search, FAST Search for SharePoint 2010, and FAST technology.

Microsoft acquired FAST search in mid-2008, and introduced the FAST Search Server for SharePoint 2010 alongside SharePoint Server 2010 when the 2010 products were released. The original FAST ESP product was also available. For SharePoint 2013, the goal was to integrate the best of the current products, along with new components not yet introduced, into a single enterprise search architecture. The result is a search platform that combines the crawler and connector framework from SharePoint Search, updated content processing and query processing from FAST technology, and a search core based on FAST Search. This architecture also includes the new analytics engine, which is used for ranking and recommendations. The FAST Search product, and the brand name FAST are gone, and SharePoint 2013 search is the current incarnation. This single architecture is most obvious during the install process, where you will notice the single Search Service application.

NOTE In addition to eliminating the separate FAST search products, including FAST for SharePoint, FAST ESP, and FAST Search for Internet Sites, the standalone Microsoft Search Server product does not have a 2013 version.

The search topology has several key improvements:

Separate crawl and indexing processes.
A new analytics process that provides search and usage analyses, including link analysis and recommendations.
The entire index is stored locally on disk, and it no longer uses the property database.
Search is scalable in two dimensions, content and query load.
The administration component can be made fault tolerant.
Native support for repartitioning the index as part of scaling out the topology.

Topology

The topology can be broken down into search components and databases that work together to provide search capability, as shown in Figure 1. In a multi-server farm, these components reside on application servers, and the databases exist on SQL Server database servers. When designing the search topology to support your requirements, you should take into account whether you are providing search for a public website or an internal intranet. Additionally, you should consider high availability and fault tolerance requirements, the amount of content, and the estimated page views and queries per second. The search components can be categorized into five groups or processes:

FIGURE 1

Crawl and content — Includes the crawl and content processing components and the crawl database
Analytics — Includes the analytics processing component, and the links and analytics reporting databases
Index — Includes the index component, index partition, and index replica
Query — Includes the query processing component
Administration — Includes the administration component and the administration database

You must define and scale the topology to accommodate your requirements. SharePoint 2013 uses Central Administration to show the current status of the search topology. Unlike SharePoint 2010, which used Central Administration to change and scale the search topology, the SharePoint 2013 search topology is created and managed using PowerShell. SharePoint 2013 has a more complex and flexible search topology, which you can manage more efficiently using PowerShell. You’ll learn how to do this in the section, “Configuring Enterprise Search.” The following sections take a detailed look at these five main search components.

Managing the Crawl Process and Crawled Properties

Search effectiveness requires that the necessary content be indexed and accessible to end users. The whole process begins with the crawl component, which is also referred to as the crawler. This component crawls content sources, and it delivers the crawled content and associated metadata to the content processing component. The crawl process interacts and retrieves data using connectors and protocol handlers. SharePoint 2013 includes more out-of-the-box connectors than SharePoint 2010, as well as Business Connectivity Services (BCS) extensibility.

Content sources that you create in the Search service application specify the repositories to be crawled. The content source represents a group of crawl settings, which includes the host to crawl, the type of content, the crawl schedule, and how deep to crawl. By default, you have the Local SharePoint Sites content source upon installation, but you can create new sources, similar to how you did with SharePoint 2010.

To manage crawl volume and performance, you can simultaneously crawl content using multiple crawl components. As the crawler processes data, it caches the content locally in preparation for sending content to the content processing component. The crawl component also uses one or more crawl databases to temporarily store information about crawled items and to track crawl history. There is no longer a one-to-one mapping of the crawl database to crawler as in SharePoint 2010; each crawl database can be associated with one or more crawlers, so you can scale them independently.

To support the need for a “fresher” index, SharePoint 2013 includes a new crawl type, the continuous crawl. The continuous crawl is applicable only to SharePoint content sources, and is a new option you can choose when you create a new content source. You can think of the continuous crawl as being similar to the incremental crawl but without the need to be scheduled. With continuous crawl, changed content is crawled every 15 minutes by default, but the frequency is configurable. If a full crawl has started, the new system allows the latest changes to appear in results before the full crawl completes. As in SharePoint 2010, all crawler configurations are stored in the administration database.

The content and metadata that has been crawled and extracted from a document or URL are represented as crawled properties. They are grouped into categories based on the iFilter or protocol handler used to retrieve the property. Examples of properties are Author and Title. New crawled properties are created after each new crawl, as new content is added to the enterprise. Crawled properties are passed to the content-processing component for further analysis.

Content Processing

This is a very specialized node in the search architecture, whose purpose is to analyze and process the data and metadata that will be included in the index. The processing node transforms the crawled items and crawled properties using language detection, document parsing, dictionaries, property mapping, and entity extraction. This component is also responsible for mapping crawled properties to managed properties.

Content processing also extracts the links and anchor text from web pages and documents, because this type of information helps influence the relevancy of the document. This raw data is stored in the link database. When a user performs a search and clicks a result, the click-through information is also stored unprocessed in the link database. All this raw data is subsequently analyzed by the analytics processing component, which updates the index with relevancy information.

Once completed, the transformed data is then sent to the index component. Content processing configurations are stored in the search administration database. This includes new crawled properties, so administrators can manually create a mapping of crawled properties to managed properties. The content-processing component is also highly extensible, by using web services that would provide information about how content should be processed.

Managed Properties

Crawled properties are mapped to managed properties to include the content and metadata in the search index. Only managed properties are included in the index; therefore, users can search only on managed properties. Managed properties have attributes, which determine how the contents are shown in search results. For an extensive list of the default SharePoint 2013 managed properties, and the associated mapped crawled properties. As the Managed Properties Overview table displayed in the reference indicates, managed properties also have associated attributes, also referred to as properties; yes, the managed properties have properties. The list of default managed properties, also referred to as the search schema or index schema, contains the managed properties, their associated properties, and the mapping between crawled properties and managed properties. You can edit the search schema yourself, by manually mapping crawled properties to managed properties, and configuring property settings. The content-processing component utilizes this schema to perform any necessary mapping.

A single managed property can be mapped to more than one crawled property, as shown in the preceding reference. In this case, the order in which the crawled properties are mapped determines the content of the managed property that is stored in the index when a crawled document contains values for more than one of the mapped crawled properties. You can also map a single crawled property to multiple managed properties.

Site columns in SharePoint libraries automatically generate a new managed property after crawling, and a mapping between the new crawled property and the new managed property. By editing the search schema, you can change the default mapping, create new mappings, or create new managed properties. A full crawl must be completed after the creation of a new managed property to ensure that its value is included in the search index.

Search Schema

The search schema is stored in the search administration database, and the schema web page, which is called Search Service Application: Managed Properties, is shown in Figure 2. This page is available from the Search Service Application: Search Administration page in Central Administration, using the Search Schema link in the Queries and Results section. Note that this page is similar to the page referenced in the preceding section, except that this page reflects the current state rather than the default state. This is the page you use when making changes to the schema.

FIGURE 2

It contains the mapping between crawled properties and managed properties, including the order of mapping for those cases that have mapped multiple crawled properties.
It maintains the settings for which index stores the managed property.
It contains the settings or properties for each of the different managed properties.
Site collection administrators can change the search schema for a particular site collection using the Site Settings page, and customize the search experience for that specific site collection. This is a new capability in SharePoint 2013; SharePoint 2010 only allowed schema changes in Central Administration. Site owners can view the schema, but they are not allowed to make changes.
It is possible to have multiple search schemas.

NOTE You can create more than one search schema by creating additional Search Service applications, each of which has its own.

Others

- SQL Server 2012 : Client Connectivity - SQL Server Native Client Features

- SQL Server 2012 : Client Connectivity - Enabling Server Connectivity

- Windows 7 : Add Someone to Your Contacts

- Windows 7 : Send an E-mail Message

- Windows 7 : Configure an E-mail Account

- Windows 7 : Install Windows Live Essentials Programs

- Deploying Windows Server 2012 (part 6) - Postinstallation tasks

- Deploying Windows Server 2012 (part 5) - Troubleshooting installation

- Deploying Windows Server 2012 (part 4) - Performing additional administration tasks during installations

- Deploying Windows Server 2012 (part 3) - Installing Windows Server 2012 - Performing a clean installation, Performing an upgrade installation, Activation sequence