Azure and DAG IP – not pingable on all nodes

If you are running an Exchange DAG in Azure you won’t always be able to ping the DAG IP. In fact, IP less DAG seems to be the recommendation.

Crazy thing is I can’t find any install guide or advise for Exchange on Azure. There’s plenty of documentation on setting up an Exchange 2013 DAG witness in Azure for use with on-prem Exchange, but nothing on actually setting up a DAG in Azure (like how you’d find for SQL AlwaysOn in Azure, for example). This is a good article though for a non-techie introduction.

Another thing to bear in mind with Azure is that all communication between VMs – including those in the same subnet – happen via a gateway. If you check the  arp output of your Azure VMs you will see that all IPs are being intercepted by a gateway.  So if the gateway doesn’t know if the IP it won’t route it. This is why for SQL AlwaysOn you need to setup the availability group IPs on the Azure load balancer, thus making Azure aware of the IP.

In my case we have a two site setup in Azure and I noticed that I was able to ping the DAG IP when the PAM was in the DR site but not when it was in the main site. First I suspected some routing issue. Then I realized that hang on, the DAG IP was configured in each site but since it was manually assigned than being via a load balancer it was assigned to a single server NIC in each site. Thus, for instance, node01 in both sites had the DAG IP assigned to it while node02 in both sites did not. It so happened that when my PAM failed over to the DR site it failed to node01 which had the DAG IP assigned (of that site subnet) while when it failed over to the primary site it happened to choose node02 which did not have the DAG IP. Simple! Elementary but didn’t realize this was what was happening as I didn’t make the PAM role go to each node to see if it behaves differently.

Sometimes you got to let go of the big picture and see the small stuff. :)