The joy of legal discovery
Legal discovery actions have been around for centuries. Over
the past two decades, the focus of discovery or searches for
information pertinent to a legal case has begun to shift from paper
evidence to electronic evidence. This shift reflects the different
manner in which organizations store data today. Filing cabinets are
still stuffed with paper, but much of the correspondence companies
conducted by letter, fax, and telex are now sent by email, so the focus
of discovery has to accommodate both paper and electronic media.
Discovery
actions for email systems first began in the mid-1980s. Messages were
recovered from backup tapes and printed for lawyers to review. The
process was expensive and time consuming. The only mitigating factor
was that it was much easier to determine who might have sent an
incriminating message because relatively few people in a company had
email, and the overall volume of email was low. Messages were text only
and tended to be short. It was therefore possible to satisfy a judge’s
order to retrieve all messages for 10 specific users over a month
without running up an extraordinarily high bill.
Today’s
environment is different. Many more users are typically hosted on each
server, they send and receive an ever-increasing volume of messages,
and those messages contain many types of attachments, including video
and audio files. The result of living in the age of electronic
communication is that the cost of legal discovery is higher because
there is more information to process. In March 2009, Fortune
magazine reported that the court-appointed trustee of bankrupt Lehman
Brothers Inc. had captured 3.2 billion email and instant messages,
occupying 1.4 terabytes (TB). This isn’t an unusual amount; the FBI
investigation of Enron in 2001 reviewed 31 TB of data and used 4 TB as
evidence. Email is a critical means of business communication that has
replaced telexes, faxes, and written letters in many respects, so legal
discovery of email has moved from an out-of-the-ordinary situation to a
form that is extremely common, whether it is to satisfy a legal or
regulatory requirement, respond to a subpoena, or deal with an internal
matter concerning employee ethics, harassment, or discipline.
The
first generation of Exchange offered no way to store mail after it was
deleted, so you had to restore a database from a backup if you wanted
to recover a message, whether it was needed to satisfy a legal order or
because a user had deleted it in error. Gradually, Microsoft began to
add new features to Exchange to help. The original version of the
dumpster (the official term now used is the “Recoverable Items”
structure), as implemented in Exchange 2000 through Exchange 2007,
provides a two-phase delete process by which messages are marked as
deleted but kept in the database until their retention period expires,
at which time they are removed.
Journaling appeared in Exchange 2003 and was upgraded in
Exchange 2007. However, the journaling functionality Exchange offered
was basic, and most companies that invested in products to capture
copies of messages preferred purpose-designed products such as
Symantec’s Enterprise Vault or Iron Mountain’s NearPoint. As mentioned
earlier, Microsoft added managed folders in Exchange 2007 with the idea
that administrators could create folders that are distributed to
mailboxes for users to store important items. However, the reality is
that most organizations ignored managed folders.
The compliance
features in Exchange 2007 were a start. However, the overall experience
was not compelling enough to generate widespread usage, which then led
Microsoft to create a new set of features that have been rolled out
over Exchange 2010 and Exchange 2013.
Although Exchange 2013
includes a wide range of compliance features, Microsoft must convince
customers that having integrated archiving and search incorporated in
an email server is a better solution than dedicated archiving and
search applications that have been in use and developed over many
years. It can be argued that cost is one key Microsoft advantage
because archiving is available at the price of an enterprise Client
Access License (CAL) that might be already acquired. Another obvious
advantage is the integration of the compliance features into the core
of Exchange, meaning that customers do not have to pay for and manage
an additional system to gain compliance features.
The
cost of an enterprise CAL for each user is often lower than the cost of
dedicated archiving software plus any additional hardware that is
required to run the archiving software. This argument works only if the
functionality available in Exchange meets your requirements. Microsoft
acknowledges that many vendors have been actively selling compliance
solutions for Exchange for nearly a decade. Some offer different
functionality than Exchange, especially in areas such as workflow, the
ability to archive information taken from other sources, and the
experience that companies have with these products in integrating
compliance processes with various regulations.
If SharePoint and
Exchange are the most important repositories of information within your
company, the two will serve as an excellent platform to enable
compliance, provided that you can deploy the necessary software
versions to use features such as site mailboxes and conduct searches
across both repositories. If Exchange 2013 is used without
SharePoint, then the focus needs to be on how to extract maximum
advantage from its compliance features. Based on this premise, you then
focus on:
Deploying
archive mailboxes in an attempt, perhaps in vain, to eliminate the
sprawl of PSTs used across the company. The aim is to use large
mailboxes to enable users to keep all their data online, which is an
advantage for both users and the company. After it is online, the data
is exposed to indexing and search.
Deploying
suitable retention policies to help users keep control of their (now
larger) mailboxes. Retention policies can sweep unwanted items out of
user mailboxes on a regular and automatic basis while moving items that
need to be retained into archive mailboxes.
Working with the company legal department to determine appropriate policies to govern:
When
users are placed on hold (when they are prevented from deleting items
from their mailboxes or making any other alteration to mailbox
content). Exchange captures attempts to delete or edit information in
the backup without interfering with the user’s ability to work with her
mailbox.
When
and how eDiscovery searches are performed, who can authorize these
operations, who has access to the data recovered by searching, how long
this data is retained, and how and when it is removed from servers.
When administrator and mailbox auditing is used and who has access to reports generated from this data.
Having
some focused goals for compliance is a good way to begin complying.
With that point in mind, the following discusses some of the ways you
can comply.
Archive mailboxes
An archive mailbox, or personal archive, is a logical
extension of a user’s primary mailbox that provides an online archive
facility. The name might cause some confusion with the personal
archives users create with Microsoft Outlook. The big difference is
that the Exchange archive is tightly integrated in the Information
Store, and the data held in the archive are therefore accessible using
all the features available to mailboxes, including eDiscovery searches.
By comparison, PST archives are usually confined to an individual PC,
and the data that they contain are inaccessible to server-based
processing.
An
archive mailbox can be stored in the same database as the primary
mailbox, or it can be in a different database. Some deployments have
created special archive servers that host databases containing only
archive mailboxes. This is a perfectly acceptable solution that offers
some advantages because the hardware can be tailored to the lower
demands that exist for access to archive information. Usually, people
don’t access their archive mailboxes as frequently as they do a primary
mailbox, which is constantly busy with the process of receiving and
sending messages. In essence, therefore, an archive is infrequently
used but always available online.
If
you use Microsoft Office 365, archive mailboxes can be stored in the
cloud, an option that has proven increasingly attractive as companies
gain more experience and confidence with cloud-based services. It is
attractive to hive off archives to a cloud-based service because this
enables you to remain focused on the care and maintenance of production
mailboxes while the hosting provider takes care of the archives.
Whatever option you choose, a mailbox can have just one personal
archive, and each mailbox that has an on-premises archive requires an
enterprise CAL. Mailboxes that use cloud-based archive mailboxes in the
Microsoft Exchange Online Archive service do not need enterprise CALs.
Microsoft
views archive mailboxes as the natural replacement for PSTs. The growth
of messages and the reluctance of administrators to increase mailbox
quotas coupled with the inability of Exchange and its clients to deal
elegantly with very large mailboxes (5 GB and up) meant that most
organizations were forced to use PSTs to offload data from the online
store. Users do like to keep messages, even if they never look at them
again. (Some conference speakers have opined that a message filed in a
PST has a 99 percent chance of never being looked at again after six
months; my personal experience tallies with this estimate.) Other
problems with PST management typically cited in corporate messaging
deployments include the following:
Reduced security. PSTs
are personal stores, but users keep just about anything in them,
including sensitive and usually unencrypted corporate information
ranging from budgets to presentations about new products to performance
reviews. If someone loses a laptop—or even a USB device that has a PST
on it—that information is immediately exposed and potentially available
to anyone who finds the device and accesses it. Even if protected by a
password, the PST file structure is insecure and can be quickly
accessed by using utilities commonly available on the Internet. After
the password is bypassed, a PST can be opened using any Microsoft
Outlook client.
Inability to respond to discovery actions. Information
held on a PST is usually invisible to searches that a company performs
to respond to discovery requests. This is fine if the information is
personal or irrelevant to the discovery request, but it could be very
expensive if required information is not disclosed to a court and is
subsequently discovered.
Inability to apply policy. Many
companies have a data retention policy that requires users to delete
documents and messages after a certain period. The period can vary,
depending on the type of information contained in different items. In
any case, the company loses any ability to apply policy centrally after
a user moves an item from his mailbox into a PST.
Exposure to data loss. Laptop
disks are notoriously prone to failure. If users don’t back up their
data, any disk crash exposes them to potential data loss, and that
information might be important.
The
alternative solution to increasing disk quota for mailboxes in previous
versions of Exchange was to buy and deploy a dedicated third-party
archiving solution. Using PSTs is obviously far cheaper for a company.
It’s also easier for users because they control how many PSTs they
create and how they use them. Some create a separate PST for each year;
some create a PST for each major project. The big downside is that PSTs
then expose the company to the risks previously described. Even so, it
will take time to pry user fingers from their beloved PSTs.
Exchange
archive mailboxes are not perfect, and a number of limitations exist
that could hinder deployment, including the following:
You
cannot transfer an archive to another mailbox. If a user leaves and you
delete her mailbox, the archive is also removed. You can save data by
exporting items from the archive (and the primary mailbox) to a PST and
then importing it back into the personal archive of another user, but
it would be more elegant just to transfer the archive intact.
You
cannot copy or move sections of the archive to transfer it to another
user. For example, a user who wants to transfer responsibility for a
project to another user has to extract the folders and other items
relating to the project from her archive and provide them to the other
user. Again, the workaround is to export selected folders from the
personal archive to a PST and provide the PST to the other user (or
import the PST into her archive). Alternatively, a site mailbox or
public folder might serve as a better repository for information that
has to be shared between different project members.
You
cannot assign permissions on a folder level within the archive to allow
users to give access to parts of their archive to other users.
Delegates who have full access to a user’s mailbox can access the
complete archive for that mailbox.
Archive
mailboxes are inaccessible from mobile clients and from Outlook for
Mac. Given the use of mobile devices today, this can be an issue for
some users.
These are examples of functionality
Microsoft will doubtless consider enhancing in the future. It’s likely
Microsoft will wait to see how archives are used in practical terms
within customer deployments before it plans how archives evolve in
future releases of Exchange.
Before you can create and use archive mailboxes with Exchange,
you have to deploy clients that support the feature. These are as
follows:
The other editions (such as Office Home and Business 2013) do not
include the code necessary to open and reveal archive mailboxes. No
mobile clients currently support access to archive mailboxes.
You
can enable an archive when you create a new mailbox by clicking the
More Options link at the bottom of the screen used to enter new mailbox
details. This reveals the check box by which to indicate that an
archive should be created alongside the primary mailbox (Figure 1).
You can also select a specific mailbox database to hold the archive or
just click Save to have Exchange use its auto-provisioning feature to
select a database to hold the archive.
To
enable an archive when you create a mailbox with Exchange Management
Shell (EMS), you just add the –Archive parameter to the New-Mailbox
cmdlet and the –ArchiveDatabase parameter to select a database for the
archive mailbox. (Again, this is not necessary because Exchange will
pick a database for you if you omit the –ArchiveDatabase parameter.)
Note
You
can also enable an archive for existing mailboxes by selecting a
mailbox in EAC and then selecting Enable under the In-Place Archive
section in the action pane (Figure 2).
EAC displays a dialog box to enable you to select a database for the
archive and to warn you that enabling this feature requires an
enterprise CAL. If you click OK, the mailbox is enabled with an
archive. You can also enable a personal archive for an existing mailbox
with EMS. For example:
Enable-Mailbox –Identity 'Tony Redmond' –Archive
As
soon as an archive has been enabled for a mailbox, it becomes available
to Outlook Web App and Outlook the next time the client refreshes its
resource information through the Autodiscover process, which provides
Outlook with information about the new archive. This usually takes a
few minutes for on-premises mailboxes but might need up to an hour for
an archive hosted in Office 365. Outlook Web App retrieves information
about the archive when it connects to the mailbox online. At this
point, the new archive will hold only a Deleted Items folder.
After
you enable an archive for a mailbox, you’ll notice that EAC displays
User (Archive) instead of just User in the mailbox type column. Unlike
its Exchange Management Console (EMC) predecessor, EAC does not include
a prewritten query to display all the mailboxes quickly that currently
have an archive. Because EAC regards archive-enabled mailboxes as
having a different type, you can click the mailbox type heading to have
EAC sort the mailboxes and present them together. Alternatively, you
can use the Get-Recipient or Get-Mailbox cmdlets to search for
mailboxes that have an archive. For example, this command looks for
mailboxes that have archives enabled and reports the mailbox name, the
archive name, and the databases in which the mailbox and archive are
located:
Get-Mailbox –Archive | Format-Table Name, ArchiveName, Database, ArchiveDatabase