EAC
is a very convenient interface with which to create and initiate
searches, but you can do the same through EMS by using a set of cmdlets
that are exposed only if you are a member of the Discovery Management
role group. These cmdlets are as follows:
For
example, a new search to look for information about potential illegal
stock trading by company officers could be initiated with this command:
All item types stored in an Exchange mailbox database are
discoverable, including voice messages, drafts, attached documents of
various formats, and IM conversations (if stored in mailboxes).
Exchange 2013 searches depend on Search Foundation to build and
maintain context indexes extracted from mailbox databases. Although
Search Foundation has no difficulty indexing the complete body text of
messages because they are plaintext, rich text format, or HTML, some
issues might be encountered with attachments, which can be in any
format. Before Search Foundation can include the actual content for an
attachment in its indexes rather than simply its metadata (such as the
file name or author name), it must be able to extract the content.
Search Foundation includes a large number of filters to enable it to
deal with the most common file formats, including Microsoft Word, HTML,
Microsoft PowerPoint PPTX files, and Adobe PDF. An additional set of
IFilters, including those for Microsoft Excel, OneNote, older versions
of PowerPoint, and Open Document files, is provided to Search
Foundation when Exchange 2013 is installed on a Mailbox server (other
IFilters are available from third-party vendors if you need to be able
to index a specific format). Between Search Foundation and Exchange, a
very large set of file formats can be indexed. To see the full set of
searchable file formats on a server, you can run this command:
Get-SearchDocumentFormat | Format-Table Name, Extension, FormatHandler -AutoSize
The
Set-SearchDocumentFormat command is also available to change the way
Exchange processes particular formats. For example, you can disable
indexing of particular types of files by running this command, a step
that requires some forethought because of its potential impact on
indexing and subsequent discovery operations.
Even
though Search Foundation possesses out-of-the-box capabilities by which
it can index the bulk of items encountered in an Exchange environment,
it is possible for your company to create content in a format that
Search Foundation does not know about. In this case, you must install
an IFilter that supports the specific format on all Mailbox servers.
Search Foundation then detects and uses the IFilter to include the
items in that format in its indexes. If you do not install an IFilter,
Search Foundation indexes the metadata for the items to allow searches
to proceed, but these items will be deemed unsearchable and returned as
such when you execute a search. Apart from application-specific files,
other items Exchange deems unsearchable include items encrypted with
Secure Multipurpose Internet Mail Extensions (S/MIME). However,
messages protected with Active Directory Rights Management Services (AD
RMS) remain searchable for discovery purposes.
When you decide to
copy search results to a discovery mailbox, you can include
unsearchable items. Normally, it’s a good idea to do this because the
person assigned to review the search results might be able to discover
what these unsearchable items contain, perhaps by examining the context
of where the item was discovered. (An item found in a folder called
Videos is likely to contain video content, for instance.)
You can
see a list of unsearchable items with the
Get-FailedContentIndexDocuments cmdlet. When you run the cmdlet, you
can pass it the name of a server to see all items on a server or just a
mailbox database to see the unsearchable items in the content index for
that database. For example, this command lists various issues that were
encountered in a specific database:
Get-FailedContentIndexDocuments –MailboxDatabase DB2
DocID Database Mailbox Subject Description
----- -------- ------- ------- -----------
77 DB2 SystemMai… The document parser encountered a processing error.
78 DB2 SystemMai… The document parser encountered a processing error.
1287 DB2 Rob Young The document parser encountered a processing error.
You
can see that a number of items in Rob Young’s mailbox have had an
issue. Exchange assigns each of the items an identifier (DocId), but
there’s no way to extract details for a specific item. Instead, you
have to run the cmdlet again, this time using the mailbox parameter to
restrict the output to just details for Rob Young’s mailbox. To see
additional information, pipe the results to the Format-List cmdlet and
then redirect the output to a text file you can then interrogate at
your leisure to see what you can discover. The command might look like
this:
Get-FailedContentIndexDocuments –Mailbox 'Rob Young' | Format-List > C:\temp\Docs.txt
You
can then search through the output text file to see whether anything
captured there provides an indication of why an item is unsearchable.
For example, you can see in this extract that the parser used to
extract the content from an item was unable to complete for some reason
and that the item is a Word document.
DocID : 1667
Database : DB2
MailboxGuid : 4e09fc34-e61a-4eea-87b8-d19b214a92ab
Mailbox : Rob Young
SmtpAddress : [email protected]
Subject : RE: 2003/2010 coexistence
ErrorCode : 7
Description : The document parser encountered a processing error.
AdditionalInfo : 309003 Document 'exchange://localhost/Attachment/298057e9-43de-417b-a740-7ab58b6e48bb/eb22b1a1-b1c9-4972-a163-ba508f018d6b/919123003011.1/Transitioning Client Access to Exchange 2013.docx' was partially processed. The parser was not able to parse the entire document.
IsPartialIndexed : False
FailedTime : 01/4/2013 11:27:52
Should
you be worried if many unsearchable items exist for your database? It
depends. First, it depends on the percentage of unsearchable items. If
0.0002 percent of items are unsearchable, it’s probably acceptable
because any search has a very high chance of discovering information
that’s required. Second, it depends on the items that are failing to be
indexed. If they are all of the same type and a filter is available,
you can install that filter to solve the problem. However, if the items
are of a type for which a filter is not available or that is known to
be unsearchable (such as S/MIME encrypted items), you might have to
live with the situation.
Normally,
a relatively small number of items turn out to be unsearchable. In
addition, remember that item metadata (sender, recipients, subject, and
so on) and message bodies are always indexed and searchable, so if a
small percentage of attachments can’t be searched, it probably won’t be
of great concern in a legal search. After all, if people are doing
something they shouldn’t, they are likely to leave some trace of their
activity in a searchable property that can be discovered. After this
happens, the next step is often for investigators to take a complete
copy of the suspect’s mailbox to conduct a detailed search to discover
what it contains, and any lurking unsearchable items can be reviewed at
that time.
Important
An
in-place hold depends on the ability of Exchange to understand when an
item might satisfy the criteria stated for the hold. Unsearchable items
might not expose sufficient information to Exchange for it to assess
whether these items should be retained and so create the potential for
required items to be removed from mailboxes. If large numbers of
unsearchable items are created because of an application you use or
other reason, it’s best not to use a query-based hold. Instead, you can
create a hold on everything in mailboxes to make absolutely sure that
everything that might be required to satisfy a search is available and
can be reviewed manually if necessary. This will add a little overhead
to the way searches are performed, but it’s the best way to ensure that
nothing slips through.
Exchange 2010 uses AQS (Advanced Search Syntax) to construct
its multimailbox searches. Exchange 2013 takes a different approach and
uses KQL (keyword query language). Why the change?
AQS is shared
with other Windows search components such as Windows Desktop Search,
which Outlook clients use. In fact, Exchange 2010 supports only a
subset of the full AQS capabilities. However, KQL is shared with other
Office 2013 applications, the most important of which is SharePoint
2013 because the two applications can form a single discovery domain
across the email stored in Exchange and the documents held in
SharePoint.
Giving Exchange and SharePoint a common search syntax
makes great sense and is the driving force behind making the change to
search syntax in Exchange 2013. Another advantage is gained in that KQL
can perform proximity searches. When you want to search for items that
mention the words “Azur project” and have the word “bribe” somewhere
close to those words, AQS can certainly find anything that includes
“Azur project” AND “bribe,” but it can’t find “Azur project” with
“bribe” within 30 words. (In KQL syntax, the word “bribe” is NEAR [n=30]
the other phrase.) This capability could be very useful in searches
that start by being somewhat imprecise because you’re not quite sure
about what you’re looking for. It’s true that searches like this might
produce more results than you can deal with on a practical basis, but
they could provide a hint about how searches might be refined to home
in on the critical items. KQL also supports wildcard searches, meaning
that you could use a term such as *toso or cont*, both of which will
force the search to find items relating to “Contoso.”
KQL
syntax is powerful. It will be interesting to see how it is used to
frame search queries as Exchange 2013 is deployed. Even better, the
Exchange community can learn KQL tips and techniques to improve
searches from those who work with SharePoint and vice versa.