Search code examples
active-directoryreplicationdomaincontroller

How to handle DC replication delay when enabling users and setting properties?


I am working on an application where I need to perform some operations on users fetched from a Domain Controller (DC). The operations involve enabling the users and then setting properties for them. Enabling and applying properties on them is an atomic operation with respect to the user.

The issue I am facing is that DC replication takes around 15 seconds to replicate to all DCs in the site. Because of this delay, my later requests sometimes hit a DC where the data is not yet synced, resulting in an error that the user is not yet enabled.

For example, let's say I have a user named "John Doe" who is currently disabled. I perform the following steps:

  • Enable "John Doe" on DC1 (the load balancer could direct the request to DC1).
  • Immediately try to set properties for "John Doe" on DC2. Since DC replication takes around 15 seconds, DC2 might not have the updated information that "John Doe" is enabled. As a result, the request to set properties fails with an error indicating that the user is not yet enabled.

How can I handle this replication delay to ensure that my requests do not fail?

The approaches I am considering are:

  1. Adding delay of 15 secs after 1st request, that is after enabling them wait for 15 secs before trying to apply properties.
  2. Adding retries on failure of second request. This approached caused some regression in our system.

Solution

    1. Explicitly choose a specific domain controller – not through a third-party load balancer (which from your description seems like it's only meant to handle reads, not writes), but by getting the list of DCs via DNS SRV records and choosing one for the duration of the operation, which is what Windows itself would normally do.

    2. Re-use a single LDAP connection (over a single TCP connection, which implicitly is tied to a specific DC) for all changes. Especially if all those changes are done "immediately", there is no reason for your code to disconnect and reconnect to LDAP in between each operation.

    3. Combine "Enabling the user" (which is not a distinct LDAP function but merely another attribute change) together with your other changes into a single LDAP Modify operation to make it actually atomic, both for replication purposes and in general.

    (Though, the replication question aside, I really don't recall AD refusing any modify operations on the grounds of a user account being disabled…)