Most of the monitoring
configurations for SharePoint can be done via Central Admin under the
aptly named Monitoring section, but some additional health and usage
reports are provided as part of the Search Service Application (more on
that later), and some settings are cleverly hidden only in PowerShell.
Under the Monitoring section in Central Admin, the locations of the
Health Analyzer and Timer Job sections are obvious; but for unknown
reasons, the ULS configuration and Usage and Health Analysis settings
are lumped together under the general Reporting heading. It’s like a
Monitoring scavenger hunt that’s gone horribly awry.
Unified Logging Service
The lowest-level monitoring system is
called the Unified Logging Service. This service records information
about each operation that occurs inside of SharePoint, and it provides
the most direct and detailed view into SharePoint’s operation. It gives
insight into the processing of specific events and requests, regardless
of whether or not an error was observed. The level of detail provided
in the ULS logs, while of great value, can be both a blessing and a
curse, as it can generate a large quantity of data to evaluate.
However, with the right configuration and the right tools, you can
strike a balance that provides the information you need without
inundating you with mountains of irrelevant data.
If you are familiar with the ULS system
introduced in SharePoint 2010, you will find that it is mostly
unchanged in SharePoint 2013. The ULS engine can log events into both
the Windows Event Log system and SharePoint’s own trace logs, with
independent control available for the reporting level of each, but
these are not simply two different channels for reporting the same
data. The types of events typically sent to the Windows Event Log are
critical errors (e.g., failure to connect to a database, service
account errors) or general informational events (e.g., farm topology
changes, search index replication status), whereas the entries made in
the trace logs go much deeper, tracking the actions (and sometimes
errors) of individual requests as they are processed.
Configuring ULS via Central Admin
Initially, the ULS appears to be very
straightforward to configure, but you will quickly see that it is not
quite so simple. First, you will look at the ULS configuration options
inside of Central Admin. Here are the steps:
1. Browse to the Monitoring section of Central Admin.
2. Select Configure Diagnostic Logging.
3. You will see a screen like the one in Figure 1.
The first block of settings, Event Throttling, enables you to select,
for a large number of categories and subcategories, the least critical
events to be sent to either the Event Log or the trace log.
As you expand the items in the tree, you will
find that the ULS provides a nearly overwhelming amount of granularity
for controlling the logging of various elements of SharePoint.
Unfortunately, there is no built-in explanation of what exactly is
included in any given category, and little or no documentation
elsewhere on the topic. Some are fairly obvious (e.g., SharePoint
Foundation ⇒ Alerts), while others wouldn’t necessarily mean anything
to a systems administrator and might very well be made up (e.g.,
SharePoint Server ⇒ Command Base Validators).
Beneath the list of categories is a pair of drop-down menus used to select the least
critical event to be reported to either the Event Log or the Trace Log.
Note that the boxes do not use exactly the same severity settings, but
rather each uses the severity nomenclature for its target log. Both
boxes list the logging options in order of decreasing severity;
therefore, the further down the list you go, the more verbose the ULS
will be in its reporting. The most verbose options can be quite
verbose. Before asking the ULS to give you more logging, be careful
what you wish for, you just might get it. It’s a good idea to move down
the list slowly, or select a few categories at a time. It is possible
to crank the logging up so high that the overhead affects performance.
The check boxes in the category list allow you to
set a specific severity level for multiple categories or subcategories
simultaneously, but you cannot set multiple different severity levels
simultaneously; every item checked is set to the values selected in the
drop-down boxes. To set more than one severity value for various
categories, you need to make your selections, click OK, and then return
to Configure Diagnostic Logging for each severity level that you wish
to set.
SharePoint 2010 introduced a great improvement
over MOSS 2007 with the capability to easily set logging categories
back to their default severity level, and that capability has been
carried over to SharePoint 2013. (In SharePoint 2007 the only want to
reset a category’s logging level was with STSADM, and there was no easy
way to determine which categories were not at their default levels.) In
SharePoint 2010 and SharePoint 2013, if any category’s logging severity
level has been changed from the default, then the new value is
displayed in bold text, making it obvious what has changed. Before
SharePoint 2010, you could only see the currently set values; there was
no indication whether that setting was the default.
The next option to look at on the page is the
Event Log Flood Protection setting. The goal, of course, is to keep the
farm running in an error-free state, but when SharePoint starts
reporting errors, it becomes very chatty indeed. This isn’t really a
problem for SharePoint’s trace logs, as we expect all errors to be
reported there. Where it becomes problematic, however, is in the
server’s event logs. These logs are used for the server’s own reporting
and for other applications, so it would be rude for SharePoint to
commandeer the logs and flood them with hundreds or thousands of
identical error messages. In fact, such a log flood could actually make
it more difficult to notice system-level error messages that might be
the root cause of an issue. The Event Log Flood Protection option is
enabled by default, which causes SharePoint to watch for repetitive
error messages; when detected, it switches to creating occasional
messages summarizing how many times the error has occurred and when it
was last seen.
If you’re using any third-party monitoring tools
that comb through your application logs looking for events, make sure
they are capable of handling flood prevention. Otherwise, they may not
recognize a problem if one exists.
Moving right along, we come to the Trace Log
configuration options. These options are, for the most part,
self-explanatory, but that is not to say that some thought shouldn’t be
put into them. The first option is the path for storing the logs. By
default it is set to %CommonProgramFiles%\Microsoft Shared\Web Server Extensions\15\LOGS\,
which is typically the C:\ drive of the SharePoint server(s). It is
recommended that you change this path to a drive other than the system
drive to avoid the possibility of the log files growing to the point of
exhausting the system drive’s free space. Windows gets very upset if
the boot drive runs out of space — and if Windows ain’t happy, nobody’s
happy.
However, two points must be considered when
relocating the trace logs. First, the path specified must exist on all
servers that are members of the farm. This is easy enough to ensure
when the farm is set up, but you must also consider whether any
additional servers may be added to the farm, as they too must have the
same path available. When you move your logs, don’t try to be overly
clever with the new path. If they’re in the same path as the 15 hive
but on a different drive, they’re very easy for other administrators to
find. Conversely, if they’re buried under E:\Logs\SharePoint Logs\ULS Logs\ they’re tougher to find without looking for their location in Central Admin or PowerShell. E:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\Logs
might be a mouthful, but it’s a very obvious place to look if there are
no ULS logs on the C drive. Second, the writing of the trace logs can
potentially be a very disk I/O-intensive operation, so it is important
to ensure that you don’t slow down SharePoint by putting the logs on
excessively slow storage, and that you don’t inadvertently slow down
other applications that might be running off the same drive you select
for your trace logs.
Following the trace log path setting are two
options controlling retention for the logs, according to both age and
total space consumed. By default, the logs are retained for 14 days and
the option to restrict log disk space usage is disabled. However, note
that the box for specifying maximum storage size uses GB as its unit of
measure; therefore, if you enable the size restriction without changing
the size value, you have effectively just configured a “restriction” of
1TB! When processing log retention, SharePoint uses the more
restrictive of the two configurations to determine which files to keep
and which to delete. For instance, if you set a retention age of 10
days and a maximum storage space of 5GB, SharePoint will delete logs
older than 10 days even if the 5GB limit hasn’t been reached;
conversely, it will delete the oldest logs if the 5GB limit is
exceeded, even if those logs are younger than 10 days old. Figure 2 shows the Restriction Settings page.