Microsoft SharePoint 2010 : Scaling out Search — adding a crawl database

9/4/2013 9:31:35 PM

The final component that can be scaled out with regards to search is crawler. In the topology search application page, it is broken into two items:

Crawl Database
Crawl Component

The Crawl Server comprises of the crawl component that is associated with a crawl database. Each crawl database has one or more crawl components (crawlers) to index its content. It is typically scaled out for redundancy purposes or to increase the speed of the crawl.

The crawler crawls the content sources and creates the index partition(s). Then it sends the created partition(s) to the query server(s). It does not hold the index partition. It writes details about the crawl into the crawl database. It is stateless in that it does not retain information itself.

Getting ready

You must have farm-level administrative permissions to the Central Administration site.

How to do it...

Open the Central Administration site and click Application Management.
The third section is Service Applications. Under this section, click Manage service applications.
Find the Search Service Application option and click on it (this is the name SharePoint assigns by default if not modified when creating the Search Service). The ribbon will light up. Click Manage.
Under the Search Application Topology section, there is a button called Modify; click this button.
The Search Service Topology page appears. Click the drop-down list named New and select Crawl Database.
The pop-up screen that appears is almost exactly like the property database pop up seen in the preceding recipe.
Fill in the following elements:
Enter the name of a database server.
Enter a name for the crawl database.
Select the Windows authentication option under Database authentication section.
If you have the name of a Failover Server, enter it in the appropriate textbox.
There is one more item in the pop-up that needs to be addressed. As shown in the next screen, do not check the Dedicate this crawl... box. It will be addressed in the Adding a host distribution rule recipe.

Click OK.
The Search Service Topology Application page will be shown again. To commit the changes you have done in the previous screen, click the Apply Topology Changes button.
Click the drop-down list named New and select Crawl Component.
The following pop-up screen appears:

Perform the following steps:
Choose a server from the drop-down list, on which to host the crawl component.
Choose the associated crawl database from the drop-down list.
Specify a location for the index by entering the value in the Temporary Location of Index textbox.
Click OK.
The Search Service Topology Application page will again be shown. To commit the changes you have done on the screen, click the Apply Topology Changes button.

How it works...

The creation of the crawl database is similar to the property database. A new database is added to the farm and can be seen physically via SQL Management Studio.

The only additional item is using host distribution rules, which will be covered in the Adding a host distribution rule recipe.

The crawl component, which is created after the database, creates a temporary index that gets stored on the Crawl Server. This index is eventually propagated and overwritten.

The crawl content and the data such as the location of the content source are saved in the crawl database. It is the crawler's job to propagate partition-indexed data to the query servers.

The crawler is truly a stateless object.

There's more...

Everything done using the UI can be scripted via PowerShell in order to automate the procedure.

Use the following command to create a new crawl database:

New-SPEnterpriseSearchCrawlDatabase

Use the following command to create a new crawl component:

New-SPEnterpriseSearchCrawlComponent

To see examples of these commands, type get-help <command> -examples in the PowerShell management shell.

More info

A crawl topology can contain multiple crawl components. By mapping multiple crawl components to a single crawl database, performance will increase and fault tolerance is achieved.

If one of the crawl components crashes, the remaining crawl components step in and perform the indexing.

If the crawler database server fails and you have specified a failover database server, SharePoint will automatically switch over to use it. However, this requires your SQL Servers to use synchronous mirroring. Asynchronous mirroring is not supported in this case.

Others

- Active Directory 2008 : Configuring Read-Only Domain Controllers (part 2) - Password Replication Policy , Administering RODC Credentials Caching, Administrative Role Separation

- Active Directory 2008 : Configuring Read-Only Domain Controllers (part 1) - Read-Only Domain Controllers, Deploying an RODC

- Active Directory 2008 : Improving the Security of Authentication in an AD DS Domain - Auditing Authentication

- Microsoft Lync Server 2010 : Enterprise Voice - Remote Site Survivability

- Microsoft Lync Server 2010 : Enterprise Voice - Enhanced 911

- Microsoft Lync Server 2010 : Enterprise Voice - Media Bypass

- Exchange Server 2010 : Object-Oriented Use of PowerShell (part 2) - Formatting Output, Directing Output to Other Cmdlets

- Exchange Server 2010 : Object-Oriented Use of PowerShell (part 1) - Filtering Output

- Exchange Server 2010 : Introduction to PowerShell and the Exchange Management Shell - Understanding the Command Syntax

- Windows 8 : Controlling Access to Files and Folders with NTFS Permissions (part 5) - Determining the Effective Permissions and Troubleshooting