Notes on Server 2012 DHCP Failover

A bit confused on some terminology regarding Server 2012 DHCP failover in hot standby mode. I don’t think I am alone in this, and have been reading up on the same. Thought I’d put down what I read in a blog post as a reference to myself.

DHCP failover in load balancing mode is straight-forward so I am going to ignore it for now. In hot standby mode there’s a Maximum Client Lead Time (MCLT) and State Switchover Interval (SSI) that have me confused.

This article from Windows IT Pro is excellent at explaining these concepts. A must read.

Following are some notes of mine from this article and a few others.

  • The DHCP lease duration – i.e. for how long a client is leased the DHCP address.
  • In a hot standby relationship one DHCP server is active, the other is standby. The relationship can be in various states:
    • Normal – all is good.
    • Communication Interrupted – the servers are unable to contact each other and have no idea what the other is up to. They do not assume the partner is down, only that they are unable to communicate (possibly due to a break in the link etc).
    • Partner Down – a server assumes that its partner is down. This need not automatically happen. Either an administrator marks the partner as down, or we can set this to automatically happen after an interval (which is the State Switchover Interval mentioned above).

The Partner Down state is easy. The standby server knows the active server is down and so it must take over. No problemo. From the time the standby server cannot contact the active server, wait for the State Switchover Interval duration and then take over the leases. Essentially progress from being in a Normal state to a Communication Interrupted state, wait for the State Switchover Interval duration, and if no luck then move to a Partner Down state.

It’s worth bearing in mind that the standby server takes over the lease only after it is in a Partner Down state. While operating in a Communication Interrupted state it only makes use of addresses from its reserved pool (by default 5% of the addresses are reserved for the standby server; this figure can be changed).

The Communication Interrupted state is a bit tricky.

  • If the standby server is cut off from the active server, and all the clients too are partitioned along with the active server, then there’s nothing to do for now – since the clients can’t contact the standby server it won’t get any requests for IP addresses anyways.
  • However, if all/ some of the clients aren’t partitioned along with the active server and can contact the standby server, what must it do?
    • If the request is for a new address, the standby server can assign one from its reserved pool. But it has no way of informing its partner that it is doing so, and since it can’t assume the partner is down it must work around the possibility that the partner too might eventually assign this address to some client. In fact, the partner/ active server too faces the same issue. If it gets a request for a new address while in the Connection Interrupted state, it can assign one but has no way of informing its partner/ standby server that it is doing so. What is required here thus is a way of leasing out an address, but on a short duration such that the client checks back in soon to renew the address and hopefully by then the Connection Interrupted state has transitioned to Normal or Partner Down. This is where the MCLT comes in. The servers lease out an address but not for the typical DHCP lease duration, but only for the MCLT duration.
    • If the request is for an address renewal the standby server can do so but again same rules as above apply. It can’t assume that the partner won’t mark this address as unused any more (as it wouldn’t have received the lease renewal request) and so both servers must renew or assign new addresses for the MCLT duration only as above.

That is not the whole story though! Turns out the MCLT is not just the temporary lease duration. It has one more influence. Check out this MCLT definition from a TechNet article:

The maximum amount of time that one server can extend a lease for a DHCP client beyond the time known by the partner server. The MCLT defines the temporary lease period given by a failover partner server, and also determines the amount of time that a server in a failover relationship will wait in partner down state before assuming control over the entire IP address range.

The MCLT cannot be set to zero, and the default setting is 1 hour.

Note the part in italics. So once a relationship transitions from Normal -> Connection Interrupted -> Partner Down (via manual admin intervention or after the State Switchover Interval) the standby server waits for the MCLT duration before taking over the entire scope. Why so?

The reason for this is because during the Communication Interrupted it is possible the active server might have assigned some IP addresses to clients – i.e. the server wasn’t actually down. These would be assigned with a lease period of MCLT and when this period expires clients will contact the erstwhile active server for a renewal, fail to get a response, and thus send a new broadcast out which will be picked up by the new active server. By waiting for the MCLT duration before taking over the entire scope, the new active server can get all such broadcasts and populate its database with any leases the previous active server might have handed out.

Here’s an excerpt from the DHCP Failover RFC:

The PARTNER-DOWN state exists so that a server can be sure that its partner is, indeed, down. Correct operation while in that state requires (generally) that the server wait the MCLT after anything that happened prior to its transition into PARTNER-DOWN state (or, more accurately, when the other server went down if that is known). Thus, the server MUST wait the MCLT after the partner server went down before allocating any of the partner’s addresses which were available for allocation. In the event the partner was not in communication prior to going down, it might have allocated one or more of its FREE addresses to a DHCP client and been unable to inform the server entering PARTNER-DOWN prior to going down itself. By waiting the MCLT after the time the partner went down, the server in PARTNER-DOWN state ensures that any clients which have a lease on one of the partner’s FREE addresses will either time out or contact the server in PARTNER-DOWN by the time that period ends.

In addition, once a server has made a transition to PARTNER-DOWN state, it MUST NOT reallocate an IP address from one client to another client until the longer of the following two times:

  • The MCLT after the time the partner server went down (see above).
  • An additional MCLT interval after the lease by the original client expires. (Actually, until the maximum client lead time after what it believes to be the lease expiration time of the client.)

Some optimizations exist for this restriction, in that it only applies to leases that were issued BEFORE entering PARTNER-DOWN. Once a server has entered PARTNER-DOWN and it leases out an address, it need not wait this time as long as it has never communicated with the partner since the lease was given out.

Interesting. So the MCLT is 1) the temporary lease duration during a Connection Interrupted state; 2) the amount of time a server will wait in Partner Down state before taking over the entire scope; and 3) the amount of a time a server will wait before assigning the IP address previously assigned to a client by its partner to a new client.

For completeness, here’s the definition of State Switchover Interval from TechNet:

If automatic state transition is enabled, a DHCP server in communication interrupted state will automatically transition to partner down state after a defined period of time. This period of time is defined by the state switchover interval.

A server that loses communication with a partner server transitions into a communication interrupted state. The loss of communication may be due to a network outage or the partner server may have gone offline. By default, since there is no way for the server to detect the reason for loss of communication with its partner, the server will continue to remain in communication interrupted state until the administrator manually changes the state to partner down. However, if you enable automatic state transition, DHCP failover will automatically transition to partner down state when the auto state switchover interval expires. The default value for auto state switchover interval is 60 minutes.

When in Communication Interrupted mode it is possible to mark the relationship as Partner Down manually. This can be done via PowerShell as follows:

Or via MMC, by going to the Failover tab.