Troubleshooting Exchange Server 2010 : Basic Troubleshooting Principles

11/8/2013 8:09:34 PM

In the old days of Exchange, one server could do it all—an Exchange 5.5 or Exchange 2000 server would receive and deliver email, handle client connections, and store user data. There was limited separation of roles between front-end and back-end servers, achieved by selecting the This Is A Front End Server check box in Exchange System Manager. But that didn't enable or disable a role; it merely changed functionality for HTTP, POP3, IMAP4, and NNTP access from redirect to proxy. Exchange Server 2007 saw a significant change in architecture with the separation of functions into server roles, although it wasn't a complete transformation—certain clients (MAPI) would still connect directly to the Mailbox servers for data while all other clients connected through Client Access servers. Now in Exchange Server 2010, even MAPI clients connect to the Client Access servers through the new RPC Client Access functionality.

The preceding paragraph should be a quick recap—why reproduce it here? Because it reinforces a key point: in order to troubleshoot Exchange, you have to understand the architecture. Understanding which functions of Exchange are controlled by which server roles is absolutely critical, or you could spend a lot of time troubleshooting the wrong server.

Troubleshooting Exchange Server 2010 often involves collecting and reviewing information from a series of servers, rather than focusing on one. For example, a user complains that he isn't receiving new email. There are a number of possible causes for this:

The user's client isn't receiving notifications of new email.
The user's client can't connect to the Client Access server to retrieve new email.
All copies of the relevant mailbox database are offline.
The user's mailbox is full.
There are no Hub Transports available to deliver his message.
Transport agents preclude delivery of email to this end user.

A closer look at this list shows an interesting breakdown. The first two issues could loosely be categorized as client access issues, the next two as database issues, and the last as transport issues. Obviously these correspond nicely to the three required server roles, and since that makes a logical breakdown, that's how we'll cover troubleshooting in the following sections. However, before we dive right into the tools, let's take a moment to consider what troubleshooting involves.

When faced with a technical problem, your immediate impulse is often to jump right into the system and start clicking. While this can be successful, particularly when you're resolving a problem you've seen hundreds of times and know like the back of your own hand, it's not necessarily a reproducible strategy. What happens when you encounter a problem you haven't seen before? What do you do when you truly have no idea what the root cause could be?

The first step in troubleshooting a problem, any problem, is to define what the problem is. In many cases, this requires asking for more information. When an end user says that she can't send email, does she mean that she can't open Outlook? That she can't generate a new email? That she clicks Send but the email never leaves the Drafts or Outbox folders? Or that she's sent messages that were never received? The end result is the same—the user can't send email—but the root causes are very different.

Once the problem has been defined, the next step is to determine the scope of the problem. This often helps clarify the direction of further troubleshooting. By determining how many users are affected—and more importantly, determining what those users have in common—you can rule out some possibilities and focus on things with a greater impact. For example, if one user can't send email, the root cause could be many things unique to that user, from Outlook configuration to network connectivity to a disabled user account.

However, if a second user has a similar issue, it's more likely to be something they have in common. Are they in the same network segment, perhaps? If 10 users on different floors all report Outlook problems, there may possibly be a problem on an Exchange server. Are all 10 users in the same database, for example, or in the same Active Directory site?

There are a number of clarifying questions that are extremely useful in determining the scope of a particular problem:

How many users are affected by the outage?
Do all the affected users access Exchange through the same method, such as Outlook, Outlook Web App, or ActiveSync?
What exactly are the users trying to do when they encounter the problem?
Are other users able to perform the same task without problems?
Are all of the users in the same database?
Are all of the users in the same site?
Does the problem occur all the time, only some of the time, or rarely?

The answers to these will often rule out possibilities right from the start. If one user can't log into Outlook successfully, but another in the same database can, you know immediately that the relevant database must be mounted and accessible, and you can then concentrate on other things.

Speaking of concentrating on other things, one of the most difficult things in troubleshooting is ignoring the unimportant distracters and focusing on what's causing the issue. It's often difficult to differentiate between what's important and what's not unless you know where to start (which is why defining the problem is so important).

Here's an example: an end user reports that he can't send email to a specific user, and during investigation you also discover that he can't access a particular public folder. Is the public folder problem directly related to the email problem? It might be—if the recipient's mailbox is on a server that also houses the only replica of that public folder, and that server's inaccessible, that would explain both problems. But in many cases it might not—the public folder store might be dismounted, the user might not have permissions, or Exchange may be blocking referrals to the replica due to site link costs. Although there's at least one explanation that covers both problems, many more exist that are unique to the secondary problem. The steps to troubleshoot internal mail flow are dramatically different from those required to troubleshoot public folder access, so if you're trying to resolve a problem with internal email, concentrate on that and leave the public folder issue for later.

Others

- SQL Server 2012 : Specialty Indexes - Specialty Indexes, Indexed Views, The Columnstore Index

- SQL Server 2012 : A Comprehensive Indexing Strategy

- SQL Server 2012 : The Path of the Query (part 5) - Filter by Unordered Composite Index, Non-SARG-Able Expressions

- SQL Server 2012 : The Path of the Query (part 4) - Filter by 2 x NC Indexes, Filter by Ordered Composite Index

- SQL Server 2012 : The Path of the Query (part 3) - Bookmark Lookup

- SQL Server 2012 : The Path of the Query (part 2) - Range Seek Query, Filter by Nonkey Column

- SQL Server 2012 : The Path of the Query (part 1) - Fetch All, Clustered Index Seek

- SQL Server 2012 : Indexing Basics (part 2) - Index Selectivity, Query Operators

- SQL Server 2012 : Indexing Basics (part 1) - The B-Tree Index, Clustered Indexes, Nonclustered Indexes

- Windows 7 : Using Internet Explorer 8 - Using Multimedia Browsing and Downloading (part 3)