When preparing to deploy SharePoint 2010
Search, there are several areas of consideration that need to be
addressed. How many servers will be used, which roles those servers
take, and how services are spread across them are dependent on how much
content there is to index and what the performance expectations are.
Another consideration, which often becomes the most critical, is how
much of a budget the organization has to meet those requirements.
This section intends to give an idea of the
factors to consider when planning a SharePoint Search deployment. Many
administrators will not have many choices when it comes to
infrastructure, so they must plan the best and most performant solution
with the hardware they have.
The key considerations for planning a deployment are as follows:
- Performance: There are two main factors for performance when
it comes to search—crawl performance and query performance. Crawl
performance refers to how fast the search crawling components can
collect text and metadata from documents and store them in the
databases. Query performance refers to the speed at which results can
be returned to end users performing searches and how that performance
may be affected by query complexity and volume. SharePoint has several
areas where performance can be improved by adjusting or adding search
or crawl components.
- Scalability: Organizations grow and shrink as do
their knowledge management requirements. Most often, we envision growth
and prosperity, and this would correspond with increasing content in
SharePoint and an increasing load on the services it provides. Search
is a service that is generally seen as increasing in popularity and
adoption, and therefore usually scaling up or out to handle demand is
necessary. However, the opposite may sometimes also be a consideration.
Scaling can be required to improve performance by adding additional
hardware and/or software components as well as improving availability
by providing redundant services across hardware. Any environment should
be planned so that one can scale to improve these factors.
- Security: One of the most key concerns
of organizations is the protection of data. Security is of paramount
concern. Security is a broad topic and worthy of careful consideration.
Security can be controlling access to servers from outside intruders,
but it can also be controlling which authenticated users are allowed to
see precisely what content.
- Availability: Critical business systems need to be available
for use. Downtime of a key SharePoint site or its related services can
result in hundreds or thousands of employees being unable to perform
their jobs. This kind of downtime can quickly cost millions of dollars
in lost productivity and undelivered goods or services. Making servers
redundant and having failover strategies can help mitigate hardware and
software problems that could cause downtime.
- Budget: Budget is always a key consideration. Organizations
need to make careful calculations about what risks they are willing to
take to reduce costs. Some risks are reasonable while others are not.
For example, saving $10,000 by not making crawl servers redundant could
be a feasible savings if company business is not adversely affected by
not having an up-to-date index for several days should the crawl
servers fail. However, having 10,000 employees not able to access
information for even a day can easily outweigh the savings.
These considerations will be discussed in more
detail in the following sections. First, it will be useful to get an
idea of the minimum hardware and software requirements that Microsoft
sets forth as well as calculate required disk size for the databases
and understand the initial deployment options.
1. Hardware and Software Requirements
SharePoint 2010's search components take their
requirements from the base SharePoint 2010 server requirements, with
the exception that query servers should have enough RAM to hold one
third of the active index partition in memory at any given time.
Therefore, care should be taken when planning query servers and the
spread of index partitions to ensure there is sufficient RAM for the
index.
Hardware Requirements
The core recommendations for hardware hosting SharePoint search are as follows:
- All development and testing servers:
- 4 core CPU
- 4GB RAM
- 80GB system drive
- All application servers:
- 4 core CPU
- 8GB RAM
- 80GB system drive
- Database servers:
- Small production deployments (less than 10 million documents)
- 4 core CPU
- 8GB RAM
- 80GB system drive
- Sufficient storage space for search databases
- Medium to large deployments (more than 10 million documents)
- 8 core CPU
- 16GB RAM
- 80GB system drive
- Sufficient storage space for search databases
Note
Microsoft Office SharePoint Server 2007 could be run on 32-bit servers.
SharePoint 2010 requires 64-bit servers. Be careful that all the
servers are 64-bit if upgrading from a previous version of SharePoint
and all associated software (e.g., third-party add-ins) is also 64-bit
compatible.
Software Requirements
Microsoft has made major advancements in the
install process of SharePoint. SharePoint 2010 has a surprisingly
friendly installer that can check the system for prerequisites and
install any missing required components. This makes installation of
SharePoint 2010 for Search installations extremely easy.
There are some important things to note,
however. SharePoint 2010 is available only for 64-bit systems. This
will mean that all hardware supporting the operating system must be
64-bit.
SharePoint 2010 search application servers require one of the following Windows operating systems:
- 64-bit Windows Server 2008 R2 (Standard, Enterprise, Datacenter, or Web Server version)
- 64-bit edition of Windows Server 2008 with Service Pack 2 (Standard, Enterprise, Datacenter, or Web Server version)
If Service Pack 2 is not installed, SharePoint 2010's installer will install it (cool!).
SharePoint 2010 search database servers (non-stand-alone) require one of the following versions of SQL Server:
- 64-bit edition of SQL Server 2008 R2
- 64-bit edition of SQL Server 2008 with Service Pack 1 and Cumulative Update 2
- 64-bit edition of SQL Server 2005 with Service Pack 3
Whenever possible, it is recommended to use the R2 releases.
There are a number of other required software
packages that the SharePoint 2010 installer's preparation tool will
install as well.
- Web server (IIS) role
- Application server role
- Microsoft .NET Framework version 3.5 SP1
- SQL Server 2008 Express with SP1
- Microsoft Sync Framework Runtime v1.0 (x64)
- Microsoft Filter Pack 2.0
- Microsoft Chart Controls for the Microsoft .NET Framework 3.5
- Windows PowerShell 2.0
- SQL Server 2008 Native Client
- Microsoft SQL Server 2008 Analysis Services ADOMD.NET
- ADO.NET Data Services Update for .NET Framework 3.5 SP1
- A hotfix for the .NET Framework 3.5 SP1 that provides a method to
support token authentication without transport security or message
encryption in WCF
- Windows Identity Foundation (WIF)
2. Database Considerations: Determining Database Size
When determining how much database to allot for
search, it is important to consider each database and its purpose
separately. Most search engine vendors' databases take between 15% and
20% of the total repository size for all search databases. Although a
safe guideline is to always allow 20% of content size space for search
databases, SharePoint's architecture is more complex and requires a
little closer consideration. Microsoft gives some formulae to calculate
the search database size. Although tests will probably not match these
calculations, they are a good place to start.
Also, remember that index partitions do not
reside in SQL on the database server. They reside on the file structure
on or relative to the query servers. Their location can be set in the
Central Administration under Manage Service Applications => Search Service Application => Search Administration => Search Application Topology => Modify. These databases could reasonably be on a high-performance disk array or storage area network. See Figure 1.
Figure 1. The Edit Query Component page with index partition path
There are three database types on the database
server held in SQL: an Administration database, crawl databases, and
property databases. The server may contain many crawl and property
databases depending on the size and complexity of the deployment. It is
essential to account for each one. Microsoft gives the following
calculations to determine their sizes.
The Search Administration database stores only
security information and search setting information and does not need
more than 10GB of storage space. It will likely not take more than 1GB
in any scenario, but it is allocated extra for good measure.
Crawl database size is relative to the size of
the content database it is crawling. Content database size, if one is
not already available to check, can be determined with the following
calculation from Microsoft:
- Database size = ((Number of documents × Number of non-current
versions) × Average size of documents) + (10KB × (List items + (Number
of non-current versions × Number of documents)))
The crawl databases sizes are then determined by multiplying the size of the content database by 4.6%.
The property databases sizes are determined by multiplying the size of the content database by 1.5%.
Total database server size requirements for search are therefore as follows:
- Admin database = 10GB
- Crawl databases = Content database size × .046
- Property databases = Content database size × .015
A simple example of a SharePoint farm with a 100GB content database would require the following:
- Admin database = 10GB
- Crawl databases = 100 × .046 = 4.6GB
- Property databases = 1TB × .015 = 1.5GB
- Total space required = 16.1GB
Additionally, the input/output per second
(IOPS) requirements on SQL for search are of importance. Search is
extremely dependent on good seek performance.
- For the crawl database, search requires from 3,500 to 7,000 IOPS.
- For the property database, search requires 2,000 IOPS.
Query Server Space Calculations
Index partitions are held on the query servers
and not in the database server (unless the server with the database
server role also has the query server role). The crawl database, Search
Administration database, and property databases are held in SQL. Index
partitions are held on the file structure. Microsoft suggests
calculating space for the index partitions at 3.5% of the content
databases. This space needs to be on the drive on which the query
server is storing the index partitions. Space should be also allocated
for the active search index and the data coming during a crawl as well
as the total space required during master merge. Therefore, the query
servers should provide at least three times the necessary space for the
index.
Note
When considering redundancy or if there is more than one index
partition, additional space for each additional partition will be
needed.
So, for example, if there is 100GB of content
in the SharePoint content database, it can be expected that a single
index partition will require 3.5GB of space. If there are two query
servers, each holding a single active index partition and one index
partition mirror, one should expect 7GB (3.5% × 2) of space or 3.5GB
per server required to hold the index partitions.
- Content database size = 100GB
- Index partition = 100 × .035 = 3.5GB
- Index partition mirror = 100 × .035 = 3.5GB
- Space for master merge = All index partitions × 3
- Total = 21GB