1. OpsMgr Component Requirements
Each OpsMgr component has specific design
requirements, and a firm knowledge of these factors is required before
beginning the design of OpsMgr. Hardware and software requirements must
be taken into account, as well as factors involving specific OpsMgr
components, such as the Root Management Server, gateway servers, service
accounts, mutual authentication, and backup requirements.
Exploring Hardware Requirements
Having
the proper hardware for OpsMgr to operate is a critical component of
OpsMgr functionality, reliability, and overall performance. Nothing is
worse than overloading a brand-new server only a few short months after
its implementation.
The industry standard generally holds that
any production servers deployed should remain relevant for three to four
years following deployment. Stretching beyond this timeframe might be
possible, but the ugly truth is that hardware investments are typically
short term and need to be replaced often to ensure relevance. Buying a
less-expensive server might save money in the short term, but could
potentially increase costs associated with downtime, troubleshooting,
and administration.
That said, the following are the Microsoft-recommended minimums for any server running an OpsMgr 2007 server component:
These recommendations apply only to the smallest
OpsMgr deployments and should be seen as minimum levels for OpsMgr
hardware. More realistic deployments have the following minimum levels:
Caution
Operations Manager 2007 R2 is one of Microsoft’s most
resource-intensive applications, so generous processor, disk, and
memory are important for optimal performance. Future expansion and
relevance of hardware should be taken into account when sizing servers
for OpsMgr deployment to ensure that the system has room to grow as
agents are added and the databases grow.
Determining Software Requirements
OpsMgr components can be installed on either 32-bit
or 64-bit versions of Windows Server 2008. The database for OpsMgr must
be run on a Microsoft SQL Server 2005 or Microsoft SQL Server 2008
server. The database can be installed on the same server as OpsMgr or on
a separate server, which is discussed in more detail in following
sections.
Tip
OpsMgr
itself must be installed on a member server in a Windows Active
Directory domain. It is commonly recommended to keep the installation of
OpsMgr on a separate server or set of dedicated member servers that do
not run any other applications that can interfere in the monitoring and
alerting process.
A few other factors critical to the success of an OpsMgr implementation are as follows:
Microsoft .NET Framework 2.0 and 3.0 must be installed on the management server and the reporting server.
Windows PowerShell.
Microsoft Core XML Services (MSXML) 6.0.
WS-MAN v1.1 (for UNIX/Linux clients).
Client
certificates must be installed in environments to facilitate mutual
authentication between non-domain members and management servers.
SQL
Reporting Services must be installed for an organization to be able to
view and produce custom reports using OpsMgr’s reporting feature.
OpsMgr Backup Considerations
The most critical piece of OpsMgr, the SQL
databases, should be regularly backed up using standard backup software
that can effectively perform online backups of SQL databases. If
integrating these specialized backup utilities into an OpsMgr deployment
is not possible, it is necessary to leverage built-in backup
functionality found in SQL Server.
2. Advanced OpsMgr Concepts
OpsMgr’s simple installation and relative ease of use
often disguises the potential complexity of its underlying components.
This complexity can be managed with the right amount of knowledge of
some of the advanced concepts of OpsMgr design and implementation.
Understanding OpsMgr Deployment Scenarios
As previously mentioned, OpsMgr components can be
divided across multiple servers to distribute load and ensure balanced
functionality. This separation enables OpsMgr servers to come in four
potential flavors, depending on the OpsMgr components held by these
servers. The four OpsMgr server types are as follows:
Operations Database Server—
An Operations Database Server is simply a member server with SQL Server
2005 installed for the OpsMgr operations database. No other OpsMgr
components are installed on this server. The SQL Server 2005 component
can be installed with default options and with the system account used
for authentication. Data in this database is kept for four days by
default.
Reporting Database Server— A
Reporting Database Server is simply a member server with SQL Server
2005 and SQL Server Reporting Services installed. This database stores
data collected through the monitoring rules for a much longer period
than the operations database and is used for reporting and trend
analysis. This database requires significantly more drive space than the
operations database server. Data in this database is kept for 13 months
by default.
Management Server—
A Management Server is the communication point for both management
consoles and agents. Effectively, a management server does not have a
database and is often used in large OpsMgr implementations that have a
dedicated database server. Often, in these configurations, multiple
management servers are used in a single management group to provide for
scalability and to address multiple managed nodes.
All-in-one server—
An all-in-one server is effectively an OpsMgr server that holds all
OpsMgr roles, including the databases. Subsequently, single-server
OpsMgr configurations use one server for all OpsMgr operations.
Multiple Configuration Groups
As previously defined, an OpsMgr management group is a
logical grouping of monitored servers that are managed by a single
OpsMgr SQL database, one or more management servers, and a unique
management group name. Each management group established operates
separately from other management groups, although they can be configured
in a hierarchical structure with a top-level management group able to
see connected lower-level management groups.
The concept of connected management groups enables
OpsMgr to scale beyond artificial boundaries and gives a great deal of
flexibility when combining OpsMgr environments. However, certain caveats
must be taken into account. Because each management group is an island,
each must subsequently be manually configured with individual settings.
In environments with a large number of customized rules, for example, a
manual configuration creates a great deal of redundant work in the
creation, administration, and troubleshooting of multiple management
groups.
Deploying Geographic-Based Configuration Groups
Based on the factors outlined in the preceding
section, it is preferable to deploy OpsMgr in a single management group.
However, in some situations an organization needs to divide its OpsMgr
environment into multiple management groups. The most common reason for
division of OpsMgr management groups is division along geographic lines.
In situations in which wide area network (WAN) links are saturated or
unreliable, it might be wise to separate large islands of WAN
connectivity into separate management groups.
Simply being separated across slow WAN links is not a
good reason to warrant a separate management group, however. For
example, small sites with few servers do not warrant the creation of a
separate OpsMgr management group, with the associated hardware,
software, and administrative costs. However, if many servers exist in a
distributed, generally well-connected geographical area, that might be a
case for the creation of a management group.
For example, an organization can be divided into several sites across
the United States, but decide to divide the OpsMgr environment into
separate management groups for East Coast and West Coast to roughly
approximate their WAN infrastructure.
Smaller sites that are not well connected but are not
large enough to warrant their own management group should have their
event monitoring throttled to avoid being sent across the WAN during
peak usage times. The downside to this approach, however, is that the
reaction time to critical event response is increased.
Deploying Political or Security-Based Configuration Groups
The less common method of dividing OpsMgr management
groups is by political or security lines. For example, it might become
necessary to separate financial servers into a separate management group
to maintain the security of the finance environment and allow for a
separate set of administrators.
Politically, if administration is not centralized
within an organization, management groups can be established to separate
OpsMgr management into separate spheres of control. This keeps each
OpsMgr management zone under separate security models.
As previously mentioned, a single management group is
the most efficient OpsMgr environment and provides for the least amount
of redundant setup, administration, and troubleshooting work.
Consequently, avoid artificial OpsMgr division along political or
security lines, if possible.
Sizing the OpsMgr Database
Depending on several factors, such as the type of
data collected, the length of time that collected data will be kept, or
the amount of database grooming that is scheduled, the size of the
OpsMgr database grows or shrinks accordingly.
Tip
It is important to monitor the size of the database
to ensure that it does not increase beyond the bounds of acceptable
size. OpsMgr can be configured to monitor itself, supplying advance
notice of database problems and capacity thresholds. This type of
strategy is highly recommended because OpsMgr can easily collect event
information faster than it can get rid of it.
The size of the operations database can be estimated through the following formula:
(Number of agents × 5 MB × retention days) + 1,024 overhead = estimated database size
For example, an OpsMgr environment monitoring 1,000
servers with the default seven-day retention period has an estimated 35
GB operations database:
(1,000 * 5 * 7) + 1,024 = 36,024 MB
The size of the reporting database can be estimated through the following formula:
(Number of agents × 3 MB × retention days) + 1,024 overhead = estimated database size
The same environment monitoring 1,000 servers with
the default 400-day retention period has an estimated 1.1 TB reporting
database:
(1,000 * 3 * 400) + 1,024 = 1,201,024 MB
Caution
It is important to understand that these estimates
are rough guidelines only and can vary widely depending on the types of
servers monitored, the monitoring configuration, the degree of
customization, and other factors.
Defining Capacity Limits
As with any system, OpsMgr includes limits that
should be taken into account before deployment begins. Surpassing these
limits might be cause for the creation of new management groups and
should subsequently be included in a design plan. These limits are as
follows:
Operations Database—
OpsMgr operates through a principle of centralized, rather than
distributed, collection of data. All event logs, performance counters,
and alerts are sent to a single centralized database, and there can
subsequently be only a single operations database per management group.
Considering the use of a backup and high-availability strategy for the
OpsMgr database is, therefore, highly recommended to protect it from
outage. It is recommended to keep this database with a 50 GB limit to
improve efficiency and reduce alert latency.
Management servers—
OpsMgr does not have a hard-coded limit of management servers per
management group. However, it is recommended to keep the environment
between three to five management servers. Each management server can
support approximately 2,000 managed agents.
Gateway servers—
OpsMgr does not have a hard-coded limit of gateway servers per
management group. However, it is recommended to deploy a gateway server
for every 200 nontrusted domain members.
Agents—
Each management server can theoretically support up to 2,000 monitored
agents. In most configurations, however, it is wise to limit the number
of agents per management server, although the levels can be scaled
upward with more robust hardware, if necessary.
Administrative Consoles—
OpsMgr does not limit the number of instances of the Web and Operations
Console; however, going beyond the suggested limit might introduce
performance and scalability problems.
Defining System Redundancy
In
addition to the scalability built in to OpsMgr, redundancy is built in
to the components of the environment. Proper knowledge of how to deploy
OpsMgr redundancy and place OpsMgr components correctly is important to
the understanding of OpsMgr redundancy.
The main components of OpsMgr can be made redundant through the following methods:
Management Servers—
Management servers are automatically redundant and agents failover and
failback automatically between them. Simply install additional
management servers for redundancy. In addition, the Root Management
Server (RMS) acts as a management server and participates in the fault
tolerance.
SQL databases—
The SQL database servers hosting the databases can be made redundant
using SQL clustering, which is based on Windows clustering. This
supports failover and failback.
Root Management Server— The RMS can be made redundant using Windows clustering. This supports failover and failback.
Having multiple management servers deployed across a
management group enables an environment to achieve a certain level of
redundancy. If a single management server experiences downtime, another
management server within the management group takes over the
responsibilities for the monitored servers in the environment. For this
reason, it might be wise to include multiple management servers in an
environment to achieve a certain level of redundancy if high uptime is a
priority.
The first management server in the management group is called the Root Management Server.
Only one RMS can exist in a management group, and it hosts the SDK and
Configuration service. All OpsMgr consoles communicate with the
management server, so its availability is critical. In large-scale
environments, the RMS should leverage Microsoft Cluster technology to
provide high availability for this component.
Caution
Because there can be only a single OpsMgr database
per management group, the database is subsequently a single point of
failure and should be protected from downtime. Using Windows Server 2008
clustering or third-party fault-tolerance solutions for SQL databases
helps to mitigate the risk involved with the OpsMgr database.
Monitoring Nondomain Member Considerations
DMZ, workgroup, and nontrusted domain agents require
special configuration, such as certificates to establish mutual
authentication. Operations Manager 2007 requires mutual authentication;
that is, the server authenticates to the client and the client
authenticates to the server to ensure that the monitoring communications
are not hacked. Without mutual authentication, a hacker can execute a
man-in-the-middle attack and impersonate either the client or the
server. Thus, mutual authentication is a security measure designed to
protect clients, servers, and sensitive Active Directory domain
information, which is exposed
to potential hacking attempts by the all-powerful management
infrastructure. However, OpsMgr relies on Active Directory Kerberos for
mutual authentication, which is not available to nondomain members.
Note
Lync Edge servers are commonly placed in the DMZ and
are not domain members, so every Lync Server 2010 environment needs to
deploy certificate-based authentication for proper monitoring.