Active Directory is a database. The really cool thing
about Active Directory is that it has multiple points of authoritative
input. Objects can be added, deleted, changed, and so on, from any
domain controller (with the obvious exception of the read-only DC, which
we will discuss later). This distributed database capability adds
tremendous flexibility to Active Directory. It makes administration and
management much easier and much more efficient. So, you might be
wondering how a distributed database with write permissions at each DC
can share those changes across a network to get true synchronization.
The process is called directory replication.
1. Understand the Components of Replication
Remember that a directory database is really just a file called NTDS.DIT. It would seem like you could just pass around the most current copy of the NTDS.DIT
file and make sure each DC had the most current copy and this whole
process would be academic. That would be just fine if your directory
database remained at its default size of around 15MB. The problem is
that as you add more and more objects to your directory, it grows and
grows and grows. Passing 15MB between DCs is not such a big deal. If the
file is 10, 20, or 50 times that size, then you have some real
bandwidth issues to deal with. You cannot feasibly pass full-size copies
of the directory database around between DCs. You have to break the
database down into smaller component parts and pass those parts around
as updates to each domain controller.
Each directory database is broken down into three separate subsections called partitions, or naming contexts. The partitions are the schema partition, configuration partition, and domain partition.
The schema partition
is replicated to all DCs in the directory forest. It contains the
information about the directory schema, which provides definitions to
all the objects in the directory.
The configuration partition is replicated to all DCs in the directory forest. It contains information about the physical structure of the actual directory.
The domain partition
is replicated only to DCs within a single domain. Each domain in a
directory forest will have its own unique domain partition information.
This is where you would find the actual users, groups, computers, and
other objects associated with the domain.
Each of these partitions is
replicated independently of the others. This allows partitions that have
lots of changes, such as the domain partition, to have a limited effect
on partitions that don't change very often, like the schema partition.
1.1. Types of Updates
Each domain controller has
the ability to write changes to the directory database. This means that
when you think about replication, there are really two types of updates
that can be made to a directory database. The update could be what is
termed an originating update,
meaning an object was created on this DC in the local copy of the
database. It does not exist elsewhere on other DCs in the forest. Once
the originating update has occurred, it needs to be sent to the other
DCs in the domain. When the other DCs receive the update, they are not
creating the original object. Instead, they are making what is termed a replicated update. They are replicating data from another DC.
1.2. Metadata
The question of how a DC knows
whether it is making an originating update or a replicated update is
significant. Each DC uses metadata to manage the replication of objects.
This means that in addition to the objects themselves, the directory
service also sends key bits of information about the DC where the object
originated, when the change was made, and what update was made (where
in the sequence of updates this one fits). All this metadata is used by
the receiving DC to determine whether this update should be written and
whether it should be sent to other partner DCs called replication partners.
Metadata items include the following:
Update sequence numbers (USN):
These sequence numbers are specific to the DC. When a change is made to
an object, the DC increments the USN by 1. Each DC maintains its own
USN independent of the other DCs in the directory. The USN of a DC is
shared with its replication partners.
High watermark vector (HWMV): This piece of metadata is used to help the DC limit the changes that are being sent across the wire at each replication.
Globally unique identifier (GUID): This piece of metadata identifies the remote DC and prevents possible confusion if the DC were to be renamed.
Up-to-dateness vector (UTDV):
This piece of metadata is used to prevent the same replication changes
from being sent out over and over again. This data is kept by each DC
for each of the other DCs associated with each of the three directory
partitions.
Through the use of these
metadata controls, it is possible to get consistent and rapid
replication updates throughout a directory forest without having to send
the entire copy of the directory database at each replication attempt.
2. Understand the Physical Constructs of Replication
Active Directory has two types of constructs. There are logical constructs, such as forests, trees, domains, and organizational units, and there are physical
constructs, such as sites and domain controllers. When replication is
discussed, it is the physical constructs of Active Directory that we are
concerned with. Replication is all about passing information about
changes to objects in the directory database to each domain controller
within and between the physical sites in the network topology.
By definition, a site is
composed of one or more IP subnets connected by high-speed links. We
like to define a high-speed link as one that has at least 512KB of
"available bandwidth." This means that the bandwidth can be entirely
dedicated to the directory service traffic of the site. If the IP
subnets in your network are not connected by high-speed links, then
generally you would create additional sites. One of the reasons you
build sites is to provide a framework on which replication can be built.
We hope you are smiling right now with the realization that replication
comes in two flavors: replication within the same site, which is
referred to as intrasiteintersite replication. replication, and replication that occurs between sites, which is referred to as
Active Directory uses a
set of standards in replication to make it as effective and efficient as
possible. These standards are referred to as the replication model
for Active Directory. In short, these standards mean that all
replication in Active Directory will follow a multimaster replication
model. Every domain controller can receive updates to data for which it
is authoritative, and all replication is "pull-based," meaning DCs
request changes rather than push or send them. This way, only desired
changes arrive at the DC. Each domain controller communicates with a
subset of all the DCs in the forest and "stores and forwards" changes,
instead of having a single DC responsible for sending all updates.
Finally, each DC tracks the state of replication updates through partner
DCs using metadata to ensure synchronization while minimizing network
bandwidth usage.
Latency in directory replication is always a concern. Latency
refers to some delay in time between an originating update and its
replication throughout the directory to the appropriate DCs. When all
changes have been updated throughout the directory, the directory is
said to have achieved convergence. The goal of replication is to build a topology where latency is minimized and you achieve convergence.
Now that we have laid the
groundwork, it is time to see how all of these components work together
to build effective replication (and all of this is done without any help
from us humans, thankfully!).
2.1. Knowledge Consistency Checker
The replication topology of
your directory is generated by a built-in component of the directory
service called the Knowledge Consistency Checker (KCC). The KCC runs
locally on each domain controller; it reads configuration data and
writes connection objects for DCs in the site. The KCC also writes local
nonreplicated values that define the replication partners from which to
request replication updates. This little application is the engine that
defines and consequently drives the topology of directory replication.
There is one designated KCC in each site that is responsible for writing
the connections to other DCs in other sites. This KCC is given the
title of Intersite Topology Generator (ISTG). Through defined
connections within and between sites, metadata and actual updates are
then passed to the DCs that make up a directory service replication
topology.
The KCC uses a host of
information about topology to build replication partnerships. In the
case of the ISTG, much of that information is user-defined as you
configure the information about the site objects and how those sites are
to be connected and when (and using what method) replication should
occur between sites.
2.2. Viewing Replication Data
When working with
Active Directory replication, it is sometimes desirable to see the
replication topology of your network. You can use the built-in
command-line tool called REPADMIN.exe to view and manage replication data in your directory.
Start REPADMIN by opening a command prompt (run as administrator) and typing REPADMIN.exe.
You will be presented with the supported commands that can be executed with REPADMIN. This tool is exceptional at reporting replication data. You might be thinking, "Wasn't there another Microsoft Tool called REPLMON
that was included with Windows Server 2003?" There was, but it was
graphical-based, not command-line-based. It is not included with Windows
Server 2008 R2, but if you were to go to the support tools folder on a
Windows Server 2003 DVD, you could install the REPLMON tool on a Windows Server 2008 R2 machine, and it would work. Please keep in mind that it is not supported as a replication monitoring tool in Windows Server 2008 R2. Use REPADMIN to be on the safe side.