Analytics Processing
This is a brand-new component for the
search architecture. Its purpose is to analyze the content and how users
interact with the content to improve search relevance, create search
reports and recommendations, and create deep links. The analytics
component analyzes two different types of information: information from
crawled items that is stored in the index (search analytics), and
information about how users interact with the search results, such as
how many times an item is viewed (usage analytics). The results from the
analyses of the search analytics, the link database, and user actions
(usage events) such as page views and liking a document, are used to
update the search index with relevancy information. This helps to ensure
that search relevance improves automatically over time. These user
events are what drive recommendations.
NOTE : The
Web Analytics capability in SharePoint 2010 has been discontinued,
replaced by the analytics processing component in SharePoint 2013. This
change was necessary to increase performance and scalability. The
analytics component provides additional capabilities such as a report of
top items, recommendations, and dynamic improvement of search result
relevancy.
Search Analytics
The search analytics component involves several different types of analyses, which are summarized in Table 1.
Administrators should review each of the analyses to ensure an
understanding of how the index is updated and made more relevant.
TABLE 1: Search Analytics Analysis
TYPE OF ANALYSIS |
DESCRIPTION |
Link and Anchor text |
This analysis determines how items in the
index are associated with each other. The results improve relevancy by
adding ranking points to the items in the search index. |
Search Clicks |
Search result relevancy is increased
(boosted) or decreased (demoted) based on which items users click the
search results. This dynamically alters the ranking of index content in
the search results. |
Deep Links |
This analysis also uses search click
results to calculate the most important site pages. These pages are
displayed in the search results to provide easy access during the user’s
search requests. |
Click Distance |
This calculates the number of clicks between an important site or page, called an authoritative page,
and the items in the search index. The index is updated to ensure that
authoritative pages are more relevant. An authoritative page is defined
by an administrator in Central Administration. |
Social Distance |
This analysis is based on the assumption that information from people you follow is more relevant to you, a metric called social distance.
Social distance is used to sort people search results: information from
people whom you follow is most relevant, information from people
followed by those whom you follow is the next most relevant, and so on. |
Social Tags |
This analysis uses words or phrases
supplied by users to categorize information. By default, this
information is not used in relevancy determination, but it can be
applied to custom search experiences like query rules. |
Search Reports |
The Search Service application stores
search reports based on the aggregation of data in the analytics
reporting database, which originated from the analytics component
analyses. These reports include Number of queries, Top queries,
Abandoned queries, No result queries, and Query rule usage. The search
reports are viewable from the View Usage Results page in Search
Administration, shown in Figure 3. |
Usage Analytics
Usage analytics involves analyzing user
actions, such as clicks or viewed items. This analysis combines the data
from user actions, also called usage events, and crawled content
information. Once complete, recommendations and usage event data is
added to the search index, which dynamically improves relevancy, and
statistical data is written to the analytics reporting database. The
default usage events are described in Table 2, but SharePoint also allows up to 12 custom events, based on the following criteria:
TABLE 2: Usage Analytics Analysis
TYPE OF ANALYSIS |
DESCRIPTION |
Usage Event Counts |
This analysis counts how many times an
item is opened or clicked, which includes search clicks and when a
document is opened. The data is aggregated at the site and site
collection levels. Usage events are temporarily stored on the WFE for
processing, and once processed the results are stored in the Search
Service application. Events are defined as recent and all time,
with the former being configurable between 1 and 14 days (the default).
This enables sorting the Most Popular Items report by Recent or Ever. |
Recommendations |
A recommendation between items is created
based on analyzing the usage patterns contained in the Usage Event
Counts analysis. This pattern analysis creates a graph that describes
the relationships between items, and this graph is stored in the
analytics reporting database, and added to the index to be used for user
personalization. For example, you could create a recommendation that
says “People who viewed this also viewed.” |
Activity Ranking |
This analysis is used to enhance search
relevance by tracking rates and trends in usage events. It considers
both recent and longer term activity to define the appropriate ranking. |
- Viewed or clicked items
- Recommendations displayed and clicked
For example, you can add a custom event that
tracks how often an item is liked, and then use this information to
customize a recommendation. This information is used to calculate two
usage reports: Popularity Trends and Most Popular Items.
Index Processing
The index is the key to providing the
best search experience, as its content determines what users find when
executing search queries. SharePoint 2013 Search, however, is more than
just users typing into the search box and getting results. SharePoint
2013 Search is a data access technology,
because it provides access to information beyond just the search box
query. The index component receives crawled and processed content and
this information is added to the search index. This component also
handles incoming queries, retrieves information from the search index,
and sends back the result set to the query processing component. The
index processing architecture can be divided into the index partition, the index replica, and the index component.
Unlike SharePoint 2010, which stored part of the index information on
disk and part in the property database, SharePoint 2013 stores all of
the index on disk. Search capability is scaled using index partitions
and index replicas; the “rows and columns” terminology from SharePoint
2010 is gone.
Index Partition
The index can be partitioned or divided into discrete portions called index partitions,
with each partition containing a separate part of the index. The search
index, which is stored in a set of files on disk, is an aggregation of
all the index partitions. This enables scaling of the index in two ways:
to handle crawl volume and to handle query volume. First, index
partitions are added to handle the crawl load associated with greater
content volume. The primary partition receives the processed information
from content processing, and it is sent to the other partitions via journal shipping. Second, the index can be scaled for query volume using index replicas.
Index Replica
Each index partition contains one or
more index replicas, with each replica containing the same index
information. You add the necessary number of replicas based on your
query volume and fault tolerance requirements. Search queries are sent
to the index replicas by the query processing component.
SharePoint automatically load balances the incoming queries to the
index replicas. Fault tolerance and redundancy are achieved by creating
additional replicas for each index partition, and distributing the index
replicas over multiple application servers. You should maintain the
same number of replicas for each partition created.
Index Component
You need to provision one index
component for each index replica. The index component does the work in
the indexing process, and during the query process. This component
receives processed items from the content processing component, writing
those items to an index file. The index component also receives incoming
queries from the query processing component, retrieves information from
the search index, and returns the query results to the query processing
component.
NOTE You
can choose the location that will be used to store the index files.
Your first option is during the SharePoint 2013 install. The second
option is during creation of the index component using the -RootDirectory switch for the New-SPEnterpriseSearchIndexComponent cmdlet.
This switch specifies the root directory that will be used for the
index associated with the new index component. Specifying the root
directory can be helpful if you wish to isolate the index on dedicated
disks, or separate disks from the OS. The root directory can be
configured for each index component. In general, you should separate the
index from the disk that contains the ULS logs.
Query Processing, Query Rules, and Result Sources
The query-processing component analyzes
incoming queries, which are sent to the index component, which returns a
set of results. This component performs linguistic analysis of the
query, including word-breaking, which determines the boundaries of the
words in the query (these vary by language), and stemming, which defines
the base or root form of the words in the query. Once the query is
processed, the query is submitted to the index component, which returns
results from the index. The results are returned to the query component,
where they are further processed before returning the results to the
search front end.
Query rules and result sources are new features in
SharePoint 2013. Query rules can be used to conditionally promote
certain results, display the results in blocks, and tune relevancy.
Result sources are used to scope the search results. SharePoint 2010
search scopes have been deprecated, replaced by result sources.
Administration
This component is responsible for
running processes that are essential to search, including new component
provisioning. The search administration database stores search
configuration data, such as the topology, crawl rules, query rules, and
the mappings between crawled and managed properties. Each Search Service
application can have only one search administration component. The
current search configuration is accessible through Central
Administration, but modifying the search topology requires PowerShell.
This completes the architecture overview. As you
have seen, several enhancements have been made to the search
architecture, and these changes have resulted in a very powerful search
capability. In the next section, you will learn how to configure and
manage this capability.