You may feel that my coverage of non-mailbox
High Availability is going to be pretty brief. This is because
configuring High Availability for these other server roles has not
significantly changed since exchange 2007, so I will just give an
overview of these requirements. However, before we start talking about
High Availability on the Mailbox Server role we have to discuss some
database technologies used in Exchange Server 2010. Exchange Server 2010
uses a database to store the primary data, i.e. the messages you send
and receive. This database technology is a transactional system, which
is pretty common, but Exchange Server uses its own technology built on
the Extensible Storage Engine (ESE), sometimes referred to as a JET
database.
When installing an Exchange Server 2010 Mailbox
Server, the initial mailbox database is, by default, stored on the local
C:\ drive; more specifically on C:\Program Files\Microsoft\Exchange Server\V14\Mailbox\Mailbox Database <<random number>>\.
This random number is generated by Exchange Server during the initial
configuration because the database names on Exchange 2010 and higher
servers must be unique within the Exchange organization.
A number of files make up the Exchange 2007 database environment:
"mailbox database 0242942819.edb"
E00.log
E00000003a.log, E000000003b.log, E00000003c.log, etc.
E00.chk
E00res00001.log and E00res00002.log
E00tmp.log
Tmp.edb.
NOTE
The random number in this example is 0242942819, hence the name of the Mailbox Database is "mailbox database 0242942819.edb."
All names in the above
mentioned list start with the same three digits: E00; this is called the
database prefix. The first database in the Exchange organization has a
prefix of E00, the second database has a prefix E01, and so on.
All of these files play a crucial role in the correct functioning of Exchange server.
A crucial step in understanding Exchange database
technology is understanding the flow of data between the Exchange Server
and the database itself. Data is processed in 32 KB blocks, also called
"pages.". When Exchange is finished processing such a page it is
immediately written to a log file if it was updated. The page is still
kept in memory until Exchange needs this memory again, but when the page
isn't used for some time, or when Exchange needs to force an update
during a checkpoint, the page is written to the database file. So, the
data in the log files is always in advance of the data in the database.
This is an important step to remember when troubleshooting database
issues!
NOTE
Exchange Server 2010
uses 32KB pages, Exchange Server 2007 uses 8KB pages, Exchange Server
2003 and earlier use 4KB pages when processing data. The parts of the
server memory that are used by these pages are referred to as the "cache
buffers."
As data is written to the database, a pointer called the checkpoint
is updated to reflect the new or updated page that was written to the
database. The checkpoint is stored in a special file called the checkpoint file,
which Exchange Server uses to make sure it knows what data has been
written to the database, and what data is in the log files and not yet
written to the database. So, in short:
Mail data is initially processed in memory, separated into pages.
Updated pages are written to the log file.
If pages are no longer needed by Exchange these pages are written to the database.
The checkpoint file is updated to reflect the new location of the checkpoint.
1 Extensible Storage Engine
The database engine used by Exchange Server is a
quite special, and is built on the Extensible Storage Engine, or ESE.
ESE exists in several flavors:
ESE97 for Exchange Server 5.5
ESE98 for Exchange Server 2000/2003
ESENT for Active Directory
ESE for Exchange Server 2007 and Exchange Server 2010.
ESE is a low-level database engine. This means it
knows all about "base types," such as short, string, long, longlong,
systime, etc., but it has no knowledge of any structure or schema. The
schema is defined by the Information Store in the application. This is
in contrast to a relational database like Microsoft SQL server, where
all the database structures are just meta-data (i.e. are part of the
database itself).
ESE is optimized for handling large amounts of
semi-structured data, as it is impossible for an Exchange Server to
predict what kind of data will be received, how large the data will be,
or what attachments messages will have.
NOTE
Ever since the early
days of Exchange, rumors have been going around about the use of
Microsoft SQL server as the database engine for Exchange Server.
Microsoft tried this for Exchange Server 2010 and actually got it
working. However, the decision was made to stay on the ESE database.
More information about this can be found on the Microsoft Exchange
Product Group blog: HTTP://TINYURL.COM/ESEDB.
2 Log files
When Exchange server is working with a page, and that
page's status is changed from dirty to clean, the page is written to
the log file almost immediately. Data held in memory is fast to access,
but volatile; all it takes is a minor hiccup in the server, and data in
memory is lost. When it is saved in the log file, the whole server could
burn down, and as long as you keep the disk, you also keep the data.
Thankfully, saving to the log file is normally a matter of milliseconds.
The log files are numbered internally, and this number (referred to as
the lGeneration number) is used for identifying the log files, and for
storing them on the disk when they are completely filled with data.
The current log file, or the "log file in use" is
E00.log, and while Exchange is filling this log file with data, a
temporary E00tmp.log file is already created (or is in the process of
being created) in the background. When the E00.log is eventually filled
with data, it is saved under another name. The name is derived from the
log file's prefix (E00, E01, E02, etc.) and the lGeneration number,
which is a sequential hexadecimal notation. So, for example, when the
lGeneration number is 1, the E00.log is saved as E0000000001.log.
Alternatively, the last time this process happened in Figure 1,
the lGeneration number was 3E, so the log file was saved as
E000000003E.log. Since the lGeneration number is a sequential number, we
know that the next lGeneration number of the E00.log must be 3F, and the next time this log file roll-over process takes place, the log file will be saved as E000000003F.log.
Although it's not directly visible, the lGeneration
number is stored inside the log file, and can be checked by dumping the
header information of the log file with the ESEUTIL utility. The first
few lines of the log file's header should read something like:
The lGeneration number is listed on the third line,
both in decimal and hexadecimal notation. Unfortunately, this is very
confusing, and there will be a day that an Exchange administrator mixes up these notations and starts working with the wrong log file.
After the pages are written to the log file, they are
kept in memory, thereby saving an expensive read from disk action when
Exchange Server needs the page again. When the Mailbox Server needs that
memory for other pages, or when the page stays in memory for a long
time, it is written to the database file. This is also known as the
"lazy writer mechanism." A common misbelief is that data is read from
the log files and written to the database file, but this is not the
case. It is written directly from memory to the database, and log files
are only read in recovery scenarios, for example, after an improper
shutdown of the server. Under normal circumstances, the log files are
100% write, whereas the database is a random mix between read and write
actions.
3 Checkpoint files
The relationship between writing data in the log
files and writing data into the database itself is managed by the
checkpoint file, E00.chk. The checkpoint file points to the page in the
database that was last written, and is advanced as soon as Exchange
writes another page from memory to the database.
The difference between the data in the database and the data in the log files is referred to as checkpoint depth.
This checkpoint depth can be several log files; in fact, the default
checkpoint depth is 20 log files. By using the checkpoint, Exchange
waits before writing to the database, and tries to combine several write
actions so that the database write operations can be performed more
efficiently.
Checkpoint depth is also a per database setting. So
when a database's checkpoint depth is 20 log files, a minimum of 20 MB
of data is kept in memory for that specific database. When using 30
databases in Exchange Server 2010, each at its maximum checkpoint depth,
approximately 600 MB of Exchange data is kept in memory.
4 The Mailbox Database
The "mailbox database 0242942819.edb" file is the primary repository of the Exchange Server 2010 Mailbox Server role. In Exchange Server 2007 this file was called "mailbox database. edb," whereas in Exchange 2003 and Exchange 2000 the database was comprised of two files: priv1.edb and priv1.stm. In Exchange Server 2010, a Mailbox Server can now hold up to 100 databases.
The maximum size of an ESE database can be
huge. The upper limit of a file on NTFS is 64 Exabytes, and this is
generally considered sufficient to host large Mailbox Database files.
The Microsoft-recommended maximum file-size of the Mailbox Database on
Exchange Server 2010 is 2TB. Compared to the 200GB file-size limit in
Exchange 2007 (using Continuous Cluster Replication) this is a
tremendous increase. Bear in mind that a prerequisite for using this
sizing is that you have to configure multiple database copies to achieve
a High Availability solution.