Windows Server 2008 R2 : Understand Active Directory Replication

2/11/2012 6:17:10 PM

Active Directory is a database. The really cool thing about Active Directory is that it has multiple points of authoritative input. Objects can be added, deleted, changed, and so on, from any domain controller (with the obvious exception of the read-only DC, which we will discuss later). This distributed database capability adds tremendous flexibility to Active Directory. It makes administration and management much easier and much more efficient. So, you might be wondering how a distributed database with write permissions at each DC can share those changes across a network to get true synchronization. The process is called directory replication.

1. Understand the Components of Replication

Remember that a directory database is really just a file called NTDS.DIT. It would seem like you could just pass around the most current copy of the NTDS.DIT file and make sure each DC had the most current copy and this whole process would be academic. That would be just fine if your directory database remained at its default size of around 15MB. The problem is that as you add more and more objects to your directory, it grows and grows and grows. Passing 15MB between DCs is not such a big deal. If the file is 10, 20, or 50 times that size, then you have some real bandwidth issues to deal with. You cannot feasibly pass full-size copies of the directory database around between DCs. You have to break the database down into smaller component parts and pass those parts around as updates to each domain controller.

Each directory database is broken down into three separate subsections called partitions, or naming contexts. The partitions are the schema partition, configuration partition, and domain partition.

The schema partition is replicated to all DCs in the directory forest. It contains the information about the directory schema, which provides definitions to all the objects in the directory.

The configuration partition is replicated to all DCs in the directory forest. It contains information about the physical structure of the actual directory.

The domain partition is replicated only to DCs within a single domain. Each domain in a directory forest will have its own unique domain partition information. This is where you would find the actual users, groups, computers, and other objects associated with the domain.

Each of these partitions is replicated independently of the others. This allows partitions that have lots of changes, such as the domain partition, to have a limited effect on partitions that don't change very often, like the schema partition.

1.1. Types of Updates

Each domain controller has the ability to write changes to the directory database. This means that when you think about replication, there are really two types of updates that can be made to a directory database. The update could be what is termed an originating update, meaning an object was created on this DC in the local copy of the database. It does not exist elsewhere on other DCs in the forest. Once the originating update has occurred, it needs to be sent to the other DCs in the domain. When the other DCs receive the update, they are not creating the original object. Instead, they are making what is termed a replicated update. They are replicating data from another DC.

1.2. Metadata

The question of how a DC knows whether it is making an originating update or a replicated update is significant. Each DC uses metadata to manage the replication of objects. This means that in addition to the objects themselves, the directory service also sends key bits of information about the DC where the object originated, when the change was made, and what update was made (where in the sequence of updates this one fits). All this metadata is used by the receiving DC to determine whether this update should be written and whether it should be sent to other partner DCs called replication partners.

Metadata items include the following:

Update sequence numbers (USN): These sequence numbers are specific to the DC. When a change is made to an object, the DC increments the USN by 1. Each DC maintains its own USN independent of the other DCs in the directory. The USN of a DC is shared with its replication partners.
High watermark vector (HWMV): This piece of metadata is used to help the DC limit the changes that are being sent across the wire at each replication.
Globally unique identifier (GUID): This piece of metadata identifies the remote DC and prevents possible confusion if the DC were to be renamed.
Up-to-dateness vector (UTDV): This piece of metadata is used to prevent the same replication changes from being sent out over and over again. This data is kept by each DC for each of the other DCs associated with each of the three directory partitions.

Through the use of these metadata controls, it is possible to get consistent and rapid replication updates throughout a directory forest without having to send the entire copy of the directory database at each replication attempt.

For more information on metadata used in replication, see http://technet.microsoft.com/en-us/magazine/2007.10.replication.aspx.

2. Understand the Physical Constructs of Replication

Active Directory has two types of constructs. There are logical constructs, such as forests, trees, domains, and organizational units, and there are physical constructs, such as sites and domain controllers. When replication is discussed, it is the physical constructs of Active Directory that we are concerned with. Replication is all about passing information about changes to objects in the directory database to each domain controller within and between the physical sites in the network topology.

By definition, a site is composed of one or more IP subnets connected by high-speed links. We like to define a high-speed link as one that has at least 512KB of "available bandwidth." This means that the bandwidth can be entirely dedicated to the directory service traffic of the site. If the IP subnets in your network are not connected by high-speed links, then generally you would create additional sites. One of the reasons you build sites is to provide a framework on which replication can be built. We hope you are smiling right now with the realization that replication comes in two flavors: replication within the same site, which is referred to as intrasiteintersite replication. replication, and replication that occurs between sites, which is referred to as

Active Directory uses a set of standards in replication to make it as effective and efficient as possible. These standards are referred to as the replication model for Active Directory. In short, these standards mean that all replication in Active Directory will follow a multimaster replication model. Every domain controller can receive updates to data for which it is authoritative, and all replication is "pull-based," meaning DCs request changes rather than push or send them. This way, only desired changes arrive at the DC. Each domain controller communicates with a subset of all the DCs in the forest and "stores and forwards" changes, instead of having a single DC responsible for sending all updates. Finally, each DC tracks the state of replication updates through partner DCs using metadata to ensure synchronization while minimizing network bandwidth usage.

Latency in directory replication is always a concern. Latency refers to some delay in time between an originating update and its replication throughout the directory to the appropriate DCs. When all changes have been updated throughout the directory, the directory is said to have achieved convergence. The goal of replication is to build a topology where latency is minimized and you achieve convergence.

Now that we have laid the groundwork, it is time to see how all of these components work together to build effective replication (and all of this is done without any help from us humans, thankfully!).

2.1. Knowledge Consistency Checker

The replication topology of your directory is generated by a built-in component of the directory service called the Knowledge Consistency Checker (KCC). The KCC runs locally on each domain controller; it reads configuration data and writes connection objects for DCs in the site. The KCC also writes local nonreplicated values that define the replication partners from which to request replication updates. This little application is the engine that defines and consequently drives the topology of directory replication. There is one designated KCC in each site that is responsible for writing the connections to other DCs in other sites. This KCC is given the title of Intersite Topology Generator (ISTG). Through defined connections within and between sites, metadata and actual updates are then passed to the DCs that make up a directory service replication topology.

The KCC uses a host of information about topology to build replication partnerships. In the case of the ISTG, much of that information is user-defined as you configure the information about the site objects and how those sites are to be connected and when (and using what method) replication should occur between sites.

2.2. Viewing Replication Data

When working with Active Directory replication, it is sometimes desirable to see the replication topology of your network. You can use the built-in command-line tool called REPADMIN.exe to view and manage replication data in your directory.

Start REPADMIN by opening a command prompt (run as administrator) and typing REPADMIN.exe.

You will be presented with the supported commands that can be executed with REPADMIN. This tool is exceptional at reporting replication data. You might be thinking, "Wasn't there another Microsoft Tool called REPLMON that was included with Windows Server 2003?" There was, but it was graphical-based, not command-line-based. It is not included with Windows Server 2008 R2, but if you were to go to the support tools folder on a Windows Server 2003 DVD, you could install the REPLMON tool on a Windows Server 2008 R2 machine, and it would work. Please keep in mind that it is not supported as a replication monitoring tool in Windows Server 2008 R2. Use REPADMIN to be on the safe side.

For detailed information about the operation of REPADMIN, visit http://technet.microsoft.com/en-us/library/cc770963(ws.10).aspx.

Others

- Windows Server 2008 R2 : Automate User and Group Management

- Windows XP : Applications and the Registry - Shared DLLs

- Windows XP : Practicing Safe Setups - Running Through a Pre-Installation Checklist

- Windows 7 : Troubleshooting Networks - Manual Troubleshooting

- Windows 7 : Letting Windows Troubleshoot the Network & Troubleshooting Network Printer Connections

- Windows Server 2003 : Using Automation to Manage Group Accounts

- Windows Server 2003 : Understanding Group Types and Scopes & Managing Group Accounts

- Security in Windows Vista : New and Improved Windows Vista Security Features

- Security in Windows Vista : Addressing Specific Security Concerns with Windows Vista

- Local Group Policy objects (part 2) : Managing the Local GPOs & GPOs in Active Directory