High Availability in Exchange Server 2010 : Exchange Server database technologies

4/16/2013 2:47:18 AM

You may feel that my coverage of non-mailbox High Availability is going to be pretty brief. This is because configuring High Availability for these other server roles has not significantly changed since exchange 2007, so I will just give an overview of these requirements. However, before we start talking about High Availability on the Mailbox Server role we have to discuss some database technologies used in Exchange Server 2010. Exchange Server 2010 uses a database to store the primary data, i.e. the messages you send and receive. This database technology is a transactional system, which is pretty common, but Exchange Server uses its own technology built on the Extensible Storage Engine (ESE), sometimes referred to as a JET database.

When installing an Exchange Server 2010 Mailbox Server, the initial mailbox database is, by default, stored on the local C:\ drive; more specifically on C:\Program Files\Microsoft\Exchange Server\V14\Mailbox\Mailbox Database <<random number>>\. This random number is generated by Exchange Server during the initial configuration because the database names on Exchange 2010 and higher servers must be unique within the Exchange organization.

Figure 1. By default the database and log files are placed on the c:\ drive.

A number of files make up the Exchange 2007 database environment:

"mailbox database 0242942819.edb"
E00.log
E00000003a.log, E000000003b.log, E00000003c.log, etc.
E00.chk
E00res00001.log and E00res00002.log
E00tmp.log
Tmp.edb.

NOTE

The random number in this example is 0242942819, hence the name of the Mailbox Database is "mailbox database 0242942819.edb."

All names in the above mentioned list start with the same three digits: E00; this is called the database prefix. The first database in the Exchange organization has a prefix of E00, the second database has a prefix E01, and so on.

All of these files play a crucial role in the correct functioning of Exchange server.

A crucial step in understanding Exchange database technology is understanding the flow of data between the Exchange Server and the database itself. Data is processed in 32 KB blocks, also called "pages.". When Exchange is finished processing such a page it is immediately written to a log file if it was updated. The page is still kept in memory until Exchange needs this memory again, but when the page isn't used for some time, or when Exchange needs to force an update during a checkpoint, the page is written to the database file. So, the data in the log files is always in advance of the data in the database. This is an important step to remember when troubleshooting database issues!

NOTE

Exchange Server 2010 uses 32KB pages, Exchange Server 2007 uses 8KB pages, Exchange Server 2003 and earlier use 4KB pages when processing data. The parts of the server memory that are used by these pages are referred to as the "cache buffers."

As data is written to the database, a pointer called the checkpoint is updated to reflect the new or updated page that was written to the database. The checkpoint is stored in a special file called the checkpoint file, which Exchange Server uses to make sure it knows what data has been written to the database, and what data is in the log files and not yet written to the database. So, in short:

Mail data is initially processed in memory, separated into pages.
Updated pages are written to the log file.
If pages are no longer needed by Exchange these pages are written to the database.
The checkpoint file is updated to reflect the new location of the checkpoint.

Figure 2. Processing of mail data in Exchange Server 2010.

1 Extensible Storage Engine

The database engine used by Exchange Server is a quite special, and is built on the Extensible Storage Engine, or ESE. ESE exists in several flavors:

ESE97 for Exchange Server 5.5
ESE98 for Exchange Server 2000/2003
ESENT for Active Directory
ESE for Exchange Server 2007 and Exchange Server 2010.

ESE is a low-level database engine. This means it knows all about "base types," such as short, string, long, longlong, systime, etc., but it has no knowledge of any structure or schema. The schema is defined by the Information Store in the application. This is in contrast to a relational database like Microsoft SQL server, where all the database structures are just meta-data (i.e. are part of the database itself).

ESE is optimized for handling large amounts of semi-structured data, as it is impossible for an Exchange Server to predict what kind of data will be received, how large the data will be, or what attachments messages will have.

NOTE

Ever since the early days of Exchange, rumors have been going around about the use of Microsoft SQL server as the database engine for Exchange Server. Microsoft tried this for Exchange Server 2010 and actually got it working. However, the decision was made to stay on the ESE database. More information about this can be found on the Microsoft Exchange Product Group blog: HTTP://TINYURL.COM/ESEDB.

2 Log files

When Exchange server is working with a page, and that page's status is changed from dirty to clean, the page is written to the log file almost immediately. Data held in memory is fast to access, but volatile; all it takes is a minor hiccup in the server, and data in memory is lost. When it is saved in the log file, the whole server could burn down, and as long as you keep the disk, you also keep the data. Thankfully, saving to the log file is normally a matter of milliseconds. The log files are numbered internally, and this number (referred to as the lGeneration number) is used for identifying the log files, and for storing them on the disk when they are completely filled with data.

The current log file, or the "log file in use" is E00.log, and while Exchange is filling this log file with data, a temporary E00tmp.log file is already created (or is in the process of being created) in the background. When the E00.log is eventually filled with data, it is saved under another name. The name is derived from the log file's prefix (E00, E01, E02, etc.) and the lGeneration number, which is a sequential hexadecimal notation. So, for example, when the lGeneration number is 1, the E00.log is saved as E0000000001.log. Alternatively, the last time this process happened in Figure 1, the lGeneration number was 3E, so the log file was saved as E000000003E.log. Since the lGeneration number is a sequential number, we know that the next lGeneration number of the E00.log must be 3F, and the next time this log file roll-over process takes place, the log file will be saved as E000000003F.log.

Although it's not directly visible, the lGeneration number is stored inside the log file, and can be checked by dumping the header information of the log file with the ESEUTIL utility. The first few lines of the log file's header should read something like:

The lGeneration number is listed on the third line, both in decimal and hexadecimal notation. Unfortunately, this is very confusing, and there will be a day that an Exchange administrator mixes up these notations and starts working with the wrong log file.

After the pages are written to the log file, they are kept in memory, thereby saving an expensive read from disk action when Exchange Server needs the page again. When the Mailbox Server needs that memory for other pages, or when the page stays in memory for a long time, it is written to the database file. This is also known as the "lazy writer mechanism." A common misbelief is that data is read from the log files and written to the database file, but this is not the case. It is written directly from memory to the database, and log files are only read in recovery scenarios, for example, after an improper shutdown of the server. Under normal circumstances, the log files are 100% write, whereas the database is a random mix between read and write actions.

3 Checkpoint files

The relationship between writing data in the log files and writing data into the database itself is managed by the checkpoint file, E00.chk. The checkpoint file points to the page in the database that was last written, and is advanced as soon as Exchange writes another page from memory to the database.

The difference between the data in the database and the data in the log files is referred to as checkpoint depth. This checkpoint depth can be several log files; in fact, the default checkpoint depth is 20 log files. By using the checkpoint, Exchange waits before writing to the database, and tries to combine several write actions so that the database write operations can be performed more efficiently.

Figure 3. All data below the checkpoint is written to the database.

Checkpoint depth is also a per database setting. So when a database's checkpoint depth is 20 log files, a minimum of 20 MB of data is kept in memory for that specific database. When using 30 databases in Exchange Server 2010, each at its maximum checkpoint depth, approximately 600 MB of Exchange data is kept in memory.

4 The Mailbox Database

The "mailbox database 0242942819.edb" file is the primary repository of the Exchange Server 2010 Mailbox Server role. In Exchange Server 2007 this file was called "mailbox database. edb," whereas in Exchange 2003 and Exchange 2000 the database was comprised of two files: priv1.edb and priv1.stm. In Exchange Server 2010, a Mailbox Server can now hold up to 100 databases.

The maximum size of an ESE database can be huge. The upper limit of a file on NTFS is 64 Exabytes, and this is generally considered sufficient to host large Mailbox Database files. The Microsoft-recommended maximum file-size of the Mailbox Database on Exchange Server 2010 is 2TB. Compared to the 200GB file-size limit in Exchange 2007 (using Continuous Cluster Replication) this is a tremendous increase. Bear in mind that a prerequisite for using this sizing is that you have to configure multiple database copies to achieve a High Availability solution.

Others

- Monitoring Microsoft Lync Server 2010 : How OpsMgr Works

- Microsoft Lync Server 2010 : Firewall and Security Requirements - Securing Service Accounts

- Active Directory Lightweight Directory Services : Configuring and Using AD LDS (part 2) - Working with AD LDS Instances

- Active Directory Lightweight Directory Services : Configuring and Using AD LDS (part 1) - Working with AD LDS Tools, Creating AD LDS Instances

- Active Directory Lightweight Directory Services : Understanding and Installing AD LDS

- Microsoft Lync Server 2010 : Using Reverse Proxies with Lync Server (part 2) - Configuring TMG to Support Lync Server

- Microsoft Lync Server 2010 : Using Reverse Proxies with Lync Server (part 1) - Configuring ISA 2006 SP1 to Support Lync Server

- Microsoft Dynamics Ax 2009 : Programming Enterprise Portal Controls (part 4) - ViewState, Page Life Cycle, Proxy Classes

- Microsoft Dynamics Ax 2009 : Programming Enterprise Portal Controls (part 3) - Labels, Formatting, Error Handling

- Microsoft Dynamics Ax 2009 : Programming Enterprise Portal Controls (part 2) - Data, Metadata