Refining a search
After
you enter all the search criteria and click Save, Exchange stores the
criteria as search metadata in the Discovery folder (Figure 8) of the search arbitration mailbox (SystemMailbox{e0dc1c29-89c3-4034-b678-e6c29d823ed9}).
Exchange then generates an initial estimate of results based on the
query specified in the search. The idea is that you immediately see how
effective the search criteria are in terms of identifying information
across the set of selected mailboxes. A good search locates the right
information and only that information. Conversely, a bad search casts
its net far too broadly and finds information that is not required for
discovery. Remember that someone eventually must go through all the
items recovered by a search and that this process grows increasingly
expensive as the number of items to be reviewed grows. In addition,
although computers accept heavy workloads, setting a task that will
retrieve tens of thousands of items from Mailbox servers across the
organization and result in several gigabytes of data transferred to the
target discovery mailbox is not something that you really want to do
without thinking. It is much better to have a focused search that
delivers exactly the right information in the right quantity, which is
the goal that search estimates and reviews help you achieve.
To
return to the current search, the metadata that describe the search
parameters to Exchange can be fetched and examined by running the
Get-MailboxSearch cmdlet. For example:
Get-MailboxSearch –Identity 'Patent Hold – Tailspin Project' | Format-List
Among the interesting properties that can be found in the output are:
SourceMailboxes and Source. These properties list the source mailboxes for the search.
TargetMailbox. Eventually,
when you have refined the search and are ready to copy items, this
property holds the name of the discovery mailbox in which Exchange will
store the items retrieved from user mailboxes.
SearchQuery. This is the query used for the search in KQL syntax.
Senders and Recipients. If
the search criteria include checking for items sent by or received by
specific users, their email addresses are listed in these properties.
MessageTypes. If
blank, this property means that all types of items held in the source
mailboxes should be searched. Otherwise, it contains the exact types to
be searched.
SearchDumpster. For
a search to be complete, it should search the Recoverable Items folder
to check whether any deleted items match the search criteria, so this
flag is usually set to $True. You can avoid searching the Recoverable
Items folder by setting the flag to $False, but this has to be done
through EMS because searches initiated from EAC always include
Recoverable Items.
IncludeUnsearchableItems. This
is usually set to $False, meaning that the initial search performed to
generate an estimate ignores any items Search Foundation is unable to
index such as S/MIME protected messages.
IncludeKeywordStatistics. This
is usually set to $True, meaning that Exchange should return keyword
statistics for the search. The TotalKeywords property contains the
number of keywords by the search.
ExcludeDuplicateMessages. This
is usually set to $False when generating an estimate to understand
exactly how many items might be found. When you are ready to retrieve
items, it’s possible to set this flag to $True so that Exchange
de-duplicates found items.
Status. After
a search is created and a first estimate is made, the value should be
Estimate Succeeded, indicating that Exchange ran the search
successfully and generated an estimate of the items in the source
mailboxes that meet the search criteria. You also see a numeric value
for the total number of source mailboxes (NumberMailboxesToSearch) and
another numeric value for the total number of items found
(ResultNumberEstimate). The exact number won’t be known until you
retrieve items and store them in the search mailbox, but the estimate
is usually accurate. In addition, you see a value returned as an
estimate of the size of the found items (ResultSizeEstimate).
PreviewResultsLink. This
property contains a URL to enable the investigator to view the search
results by presenting an interface similar to Outlook Web App to the
items in the source mailboxes. Reviewing items in this manner is an
excellent way for an investigator to determine whether the search is
successful or needs to be refined.
InPlaceHoldEnabled. If
set to $True, an in-place hold is present. The InPlaceHoldIdentity
property provides a link to the hold that is written into the source
mailboxes to provide a connection to the search.
Equipped
with knowledge about the initial estimate for a search, you can think
about how the search might be refined. Perhaps the criteria expressed
in the query are not quite good enough and need another keyword, or
perhaps the association between the keywords needs to be tweaked in
some way. Maybe you missed some mailboxes and need to add them to the
source list. And what about unsearchable items? Are they likely to be a
problem?
In
most cases, an investigator refines a search by trying different
combinations of keywords until he is satisfied that the search will
uncover a reasonable (or expected) volume of information. After each
change is made to the search parameters, you can use Estimate Search
Results (Figure 9)
to assess how effective the search is. When an estimate is requested,
Exchange queues the search for processing. Depending on the size of the
organization and the scope of the search, the new estimate might be
available in a matter of minutes or take a little longer. Eventually,
the search estimate completes, and its metadata is updated so EAC can
display new information about keyword statistics (visible in the
details pane shown in the bottom right of Figure 9).
When
the investigator is satisfied that the search query is generating
results in the general area of the desired set, he can use human
intelligence to validate the effectiveness of the query by selecting
Preview Search Results. From the discussion about search properties,
you know that the metadata stores a URL that generates an interface
similar to Outlook Web App to display the items the search identified.
The URL will look something like this:
https://exserver2.contoso.com/owa/default.aspx?cmd=contents&module=discovery&discoveryid=Patent%20hold%20-%20Tailspin%20project.
Clicking Preview Search Estimates displays a new tab in the browser, shown in Figure 10.
The intention behind previewing is simply enabling an investigator to
see the actual content of items identified by the search rather than
allowing her full Outlook Web App functionality. You cannot, for
instance, find an item of interest and email it to another user when
previewing search results. Cut and paste is also disabled in the
viewing pane, so you cannot use that route to save information and
include it in a message, although nothing stops you from taking a
screen shot. An investigator should be able to tell from the items
turned up by a search whether any further refinement is necessary.
Either too much spurious information is being uncovered or too little
information of interest has been found. In either situation, the
preview should give an extremely good hint about what needs to be done
to improve the search to a point at which the investigator can begin
extracting items from user mailboxes and copying them to a discovery
mailbox.
All
searches that EAC launches automatically examine deleted items held in
the Recoverable Items folder. (The SearchDumpster flag is $True.) If
you want, you can exclude deleted items from the search by updating the
flag to $False. However, Recoverable Items is included to ensure that
any items of interest are captured even if a user has attempted to
remove all traces of their existence. If an item is found in
Recoverable Items, it is shown in the Recoverable Items folder within
the user’s primary mailbox or archive when the items are reviewed in
the discovery mailbox.