Sharepoint 2013 : Configuring and Managing Enterprise Search - SEARCH ARCHITECTURE (part 2)

8/9/2013 9:11:05 AM

Analytics Processing

This is a brand-new component for the search architecture. Its purpose is to analyze the content and how users interact with the content to improve search relevance, create search reports and recommendations, and create deep links. The analytics component analyzes two different types of information: information from crawled items that is stored in the index (search analytics), and information about how users interact with the search results, such as how many times an item is viewed (usage analytics). The results from the analyses of the search analytics, the link database, and user actions (usage events) such as page views and liking a document, are used to update the search index with relevancy information. This helps to ensure that search relevance improves automatically over time. These user events are what drive recommendations.

NOTE : The Web Analytics capability in SharePoint 2010 has been discontinued, replaced by the analytics processing component in SharePoint 2013. This change was necessary to increase performance and scalability. The analytics component provides additional capabilities such as a report of top items, recommendations, and dynamic improvement of search result relevancy.

Search Analytics

The search analytics component involves several different types of analyses, which are summarized in Table 1. Administrators should review each of the analyses to ensure an understanding of how the index is updated and made more relevant.

TABLE 1: Search Analytics Analysis

TYPE OF ANALYSIS	DESCRIPTION
Link and Anchor text	This analysis determines how items in the index are associated with each other. The results improve relevancy by adding ranking points to the items in the search index.
Search Clicks	Search result relevancy is increased (boosted) or decreased (demoted) based on which items users click the search results. This dynamically alters the ranking of index content in the search results.
Deep Links	This analysis also uses search click results to calculate the most important site pages. These pages are displayed in the search results to provide easy access during the user’s search requests.
Click Distance	This calculates the number of clicks between an important site or page, called an authoritative page, and the items in the search index. The index is updated to ensure that authoritative pages are more relevant. An authoritative page is defined by an administrator in Central Administration.
Social Distance	This analysis is based on the assumption that information from people you follow is more relevant to you, a metric called social distance. Social distance is used to sort people search results: information from people whom you follow is most relevant, information from people followed by those whom you follow is the next most relevant, and so on.
Social Tags	This analysis uses words or phrases supplied by users to categorize information. By default, this information is not used in relevancy determination, but it can be applied to custom search experiences like query rules.
Search Reports	The Search Service application stores search reports based on the aggregation of data in the analytics reporting database, which originated from the analytics component analyses. These reports include Number of queries, Top queries, Abandoned queries, No result queries, and Query rule usage. The search reports are viewable from the View Usage Results page in Search Administration, shown in Figure 3.

FIGURE 3

Usage Analytics

Usage analytics involves analyzing user actions, such as clicks or viewed items. This analysis combines the data from user actions, also called usage events, and crawled content information. Once complete, recommendations and usage event data is added to the search index, which dynamically improves relevancy, and statistical data is written to the analytics reporting database. The default usage events are described in Table 2, but SharePoint also allows up to 12 custom events, based on the following criteria:

TABLE 2: Usage Analytics Analysis

TYPE OF ANALYSIS	DESCRIPTION
Usage Event Counts	This analysis counts how many times an item is opened or clicked, which includes search clicks and when a document is opened. The data is aggregated at the site and site collection levels. Usage events are temporarily stored on the WFE for processing, and once processed the results are stored in the Search Service application. Events are defined as recent and all time, with the former being configurable between 1 and 14 days (the default). This enables sorting the Most Popular Items report by Recent or Ever.
Recommendations	A recommendation between items is created based on analyzing the usage patterns contained in the Usage Event Counts analysis. This pattern analysis creates a graph that describes the relationships between items, and this graph is stored in the analytics reporting database, and added to the index to be used for user personalization. For example, you could create a recommendation that says “People who viewed this also viewed.”
Activity Ranking	This analysis is used to enhance search relevance by tracking rates and trends in usage events. It considers both recent and longer term activity to define the appropriate ranking.

Viewed or clicked items
Recommendations displayed and clicked

For example, you can add a custom event that tracks how often an item is liked, and then use this information to customize a recommendation. This information is used to calculate two usage reports: Popularity Trends and Most Popular Items.

Index Processing

The index is the key to providing the best search experience, as its content determines what users find when executing search queries. SharePoint 2013 Search, however, is more than just users typing into the search box and getting results. SharePoint 2013 Search is a data access technology, because it provides access to information beyond just the search box query. The index component receives crawled and processed content and this information is added to the search index. This component also handles incoming queries, retrieves information from the search index, and sends back the result set to the query processing component. The index processing architecture can be divided into the index partition, the index replica, and the index component. Unlike SharePoint 2010, which stored part of the index information on disk and part in the property database, SharePoint 2013 stores all of the index on disk. Search capability is scaled using index partitions and index replicas; the “rows and columns” terminology from SharePoint 2010 is gone.

Index Partition

The index can be partitioned or divided into discrete portions called index partitions, with each partition containing a separate part of the index. The search index, which is stored in a set of files on disk, is an aggregation of all the index partitions. This enables scaling of the index in two ways: to handle crawl volume and to handle query volume. First, index partitions are added to handle the crawl load associated with greater content volume. The primary partition receives the processed information from content processing, and it is sent to the other partitions via journal shipping. Second, the index can be scaled for query volume using index replicas.

Index Replica

Each index partition contains one or more index replicas, with each replica containing the same index information. You add the necessary number of replicas based on your query volume and fault tolerance requirements. Search queries are sent to the index replicas by the query processing component. SharePoint automatically load balances the incoming queries to the index replicas. Fault tolerance and redundancy are achieved by creating additional replicas for each index partition, and distributing the index replicas over multiple application servers. You should maintain the same number of replicas for each partition created.

Index Component

You need to provision one index component for each index replica. The index component does the work in the indexing process, and during the query process. This component receives processed items from the content processing component, writing those items to an index file. The index component also receives incoming queries from the query processing component, retrieves information from the search index, and returns the query results to the query processing component.

NOTE You can choose the location that will be used to store the index files. Your first option is during the SharePoint 2013 install. The second option is during creation of the index component using the -RootDirectory switch for the New-SPEnterpriseSearchIndexComponent cmdlet. This switch specifies the root directory that will be used for the index associated with the new index component. Specifying the root directory can be helpful if you wish to isolate the index on dedicated disks, or separate disks from the OS. The root directory can be configured for each index component. In general, you should separate the index from the disk that contains the ULS logs.

Query Processing, Query Rules, and Result Sources

The query-processing component analyzes incoming queries, which are sent to the index component, which returns a set of results. This component performs linguistic analysis of the query, including word-breaking, which determines the boundaries of the words in the query (these vary by language), and stemming, which defines the base or root form of the words in the query. Once the query is processed, the query is submitted to the index component, which returns results from the index. The results are returned to the query component, where they are further processed before returning the results to the search front end.

Query rules and result sources are new features in SharePoint 2013. Query rules can be used to conditionally promote certain results, display the results in blocks, and tune relevancy. Result sources are used to scope the search results. SharePoint 2010 search scopes have been deprecated, replaced by result sources.

Administration

This component is responsible for running processes that are essential to search, including new component provisioning. The search administration database stores search configuration data, such as the topology, crawl rules, query rules, and the mappings between crawled and managed properties. Each Search Service application can have only one search administration component. The current search configuration is accessible through Central Administration, but modifying the search topology requires PowerShell.

This completes the architecture overview. As you have seen, several enhancements have been made to the search architecture, and these changes have resulted in a very powerful search capability. In the next section, you will learn how to configure and manage this capability.

Others

- Sharepoint 2013 : Configuring and Managing Enterprise Search - SEARCH ARCHITECTURE (part 1)

- SQL Server 2012 : Client Connectivity - SQL Server Native Client Features

- SQL Server 2012 : Client Connectivity - Enabling Server Connectivity

- Windows 7 : Add Someone to Your Contacts

- Windows 7 : Send an E-mail Message

- Windows 7 : Configure an E-mail Account

- Windows 7 : Install Windows Live Essentials Programs

- Deploying Windows Server 2012 (part 6) - Postinstallation tasks

- Deploying Windows Server 2012 (part 5) - Troubleshooting installation

- Deploying Windows Server 2012 (part 4) - Performing additional administration tasks during installations