A colleague at work mentioned a Server 2016 DNS zone delegation bug he had found. I found just one post on the Internet when I searched for this. According to my colleague Microsoft has now confirmed this as a bug in the support call he raised.
DNS being an area of interest I wanted to replicate the issue and post it – so here goes. Hopefully it makes sense. :)
Imagine a split-DNS scenario for a zone rockylabs.zero
.
- This zone is hosted externally by a DNS server (doesn’t matter what OS/ software) called
data01
. - This zone is hosted internally by two DNS servers: a Server 2012R2 (called
DC2012-01
), and a Server 2016 (calledDC2016-01
).
Now say there’s a record rakhesh.rockylabs.zero
that is the same both internally and externally. As in, we want both internal and external users to get the same (external) IP address for this record.
What you would typically do is add this record to your external DNS server and create a delegation from your two internal DNS servers, for this record, to the external DNS server. Here’s some screenshots:
The zone on my external DNS server. Notice I have an A record for rakhesh.rockylabs.zero
.
Ignore the rakhesh2.rockylabs.zero
record for now. That comes in later. :)
Here’s a look at the delegation from my Server 2012R2 internal DNS server to the external DNS server for the rakhesh.rockylabs.zero
record. Basically I create a delegation within the rockylabs.zero
zone on the internal server, for the rakhesh
domain, and point it to the external DNS server. On the external DNS server rakhesh.rockylabs.zero
is defined as an A record so that will be returned as an answer when this delegation chain is followed.
In my case both the internal DNS servers are also DCs, and the rockylabs.zero
zone is AD integrated, so a similar delegation is automatically created on the other DNS server too.
As would be expected, I am able to resolve this A record correctly from both internal DNS servers.
Now for the fun part!
Notice the rakhesh2.rockylabs.zero
record on my external DNS server? Unlike rakhesh.rockylabs.zero
this one is a CNAME record. This too should be common for both internal and external users. Shouldn’t be a problem really as it should work similarly to the A record. Following the chain of delegation when I resolve rakhesh2.rockylabs.zero
to a CNAME record called rakhesh.com
, my DNS server should automatically resolve the A record for rakhesh.com
and give me its address as the answer for rakhesh2.rockylabs.zero
. It works with the Server 2012R2 internal DNS server as expected –
But breaks for the 2016 internal DNS server!
And that’s it! That’s the bug basically.
Here’s the odd bit though. If I were to query rakhesh.com
(the domain to which the CNAME record points to), and then try to resolve the delegated record, it works!
If I go ahead and clear the cache on that 2016 internal server and try the name resolution again, it’s broken as before.
So the issue is that the 2016 DNS Server is able to follow the delegation for rakhesh2.rockylabs.zero
to the external DNS server and resolve it to rakhesh.com
, but it is doesn’t then go ahead and lookup rakhesh.com
to get its A record. But if the A record for rakhesh.com
is already cached with it, it is sensible enough to return that address.
I dug a bit more into this by enabling debug logging on the 2016 server. Here’s what I found.
The 2016 server receives my query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
4/25/2017 8:56:35 PM 0FC0 PACKET 000001FF8DCA2D10 UDP Rcv 10.10.1.2 0002 Q [0001 D NOERROR] A (8)rakhesh2(9)rockylabs(4)zero(0) UDP question info at 000001FF8DCA2D10 Socket = 668 Remote addr 10.10.1.2, port 63329 Time Query=2087079, Queued=0, Expire=0 Buf length = 0x0fa0 (4000) Msg length = 0x0029 (41) Message: XID 0x0002 Flags 0x0100 QR 0 (QUESTION) OPCODE 0 (QUERY) AA 0 TC 0 RD 1 RA 0 Z 0 CD 0 AD 0 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 0 NSCOUNT 0 ARCOUNT 0 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(8)rakhesh2(9)rockylabs(4)zero(0)" QTYPE A (1) QCLASS 1 ANSWER SECTION: empty AUTHORITY SECTION: empty ADDITIONAL SECTION: empty |
It passes this on to the external server (10.10.1.11 is data01
– external DNS server where rakhesh2.rockylabs.zero
is delegated to). FYI, I am truncating the output here:
1 2 3 4 5 |
4/25/2017 8:56:35 PM 0FC0 PACKET 000001FF8DCA02F0 UDP Snd 10.10.1.11 fbff Q [0000 NOERROR] A (8)rakhesh2(9)rockylabs(4)zero(0) UDP question info at 000001FF8DCA02F0 Socket = 15140 Remote addr 10.10.1.11, port 53 ... |
It gets a reply with the CNAME record. So far so good.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
4/25/2017 8:56:35 PM 0FC0 PACKET 000001FF908D10B0 UDP Rcv 10.10.1.11 fbff R Q [8084 A R NOERROR] A (8)rakhesh2(9)rockylabs(4)zero(0) UDP response info at 000001FF908D10B0 Socket = 15140 Remote addr 10.10.1.11, port 53 Time Query=2087079, Queued=0, Expire=0 Buf length = 0x0fa0 (4000) Msg length = 0x004d (77) Message: XID 0xfbff Flags 0x8480 QR 1 (RESPONSE) OPCODE 0 (QUERY) AA 1 TC 0 RD 0 RA 1 Z 0 CD 0 AD 0 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 1 NSCOUNT 0 ARCOUNT 1 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(8)rakhesh2(9)rockylabs(4)zero(0)" QTYPE A (1) QCLASS 1 ANSWER SECTION: Offset = 0x0029, RR count = 0 Name "[C00C](8)rakhesh2(9)rockylabs(4)zero(0)" TYPE CNAME (5) CLASS 1 TTL 3600 DLEN 13 DATA (7)rakhesh(3)com(0) AUTHORITY SECTION: empty ADDITIONAL SECTION: Offset = 0x0042, RR count = 0 Name "(0)" TYPE OPT (41) CLASS 4000 TTL 32768 DLEN 0 DATA Buffer Size = 4000 Rcode Ext = 0 Rcode Full = 0 Version = 0 Flags = 80 DO |
Now it queries the external DNS server (data01
– 10.10.1.11) asking for the A record of rakhesh.com
! That’s two wrong things: 1) Why ask the external DNS server (who as far as the internal DNS server knows is only delegated the rakhesh2.rockylabs.zero
zone and has nothing to do with rakhesh.com
) and 2) why ask it for the A record instead of the NS record so it can find the name servers for rakhesh.com
and ask those for the IP address of rakhesh.com
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
4/25/2017 8:56:35 PM 0FC0 PACKET 000001FF8DCA02F0 UDP Snd 10.10.1.11 d43a Q [0000 NOERROR] A (7)rakhesh(3)com(0) UDP question info at 000001FF8DCA02F0 Socket = 14836 Remote addr 10.10.1.11, port 53 Time Query=0, Queued=0, Expire=0 Buf length = 0x0fa0 (4000) Msg length = 0x0028 (40) Message: XID 0xd43a Flags 0x0000 QR 0 (QUESTION) OPCODE 0 (QUERY) AA 0 TC 0 RD 0 RA 0 Z 0 CD 0 AD 0 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 0 NSCOUNT 0 ARCOUNT 1 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(7)rakhesh(3)com(0)" QTYPE A (1) QCLASS 1 ANSWER SECTION: empty AUTHORITY SECTION: empty ADDITIONAL SECTION: Offset = 0x001d, RR count = 0 Name "(0)" TYPE OPT (41) CLASS 4000 TTL 32768 DLEN 0 DATA Buffer Size = 4000 Rcode Ext = 0 Rcode Full = 0 Version = 0 Flags = 80 DO |
It’s pretty much downhill from there, coz as expected the external DNS server replies saying “doh I don’t know” and gives it a list of root servers to contact. FYI, I truncated the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
4/25/2017 8:56:35 PM 0FC0 PACKET 000001FF907E4950 UDP Rcv 10.10.1.11 d43a R Q [8080 R NOERROR] A (7)rakhesh(3)com(0) UDP response info at 000001FF907E4950 Socket = 14836 Remote addr 10.10.1.11, port 53 Time Query=2087079, Queued=0, Expire=0 Buf length = 0x0fa0 (4000) Msg length = 0x01d7 (471) Message: XID 0xd43a Flags 0x8080 QR 1 (RESPONSE) OPCODE 0 (QUERY) AA 0 TC 0 RD 0 RA 1 Z 0 CD 0 AD 0 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 0 NSCOUNT 13 ARCOUNT 14 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(7)rakhesh(3)com(0)" QTYPE A (1) QCLASS 1 ANSWER SECTION: empty AUTHORITY SECTION: Offset = 0x001d, RR count = 0 Name "(0)" TYPE NS (2) CLASS 1 TTL 3600 DLEN 20 DATA (1)d(12)root-servers(3)net(0) Offset = 0x003c, RR count = 1 Name "[C01D](0)" TYPE NS (2) CLASS 1 TTL 3600 DLEN 4 DATA (1)e[C02A](12)root-servers(3)net(0) Offset = 0x004c, RR count = 2 Name "[C01D](0)" TYPE NS (2) CLASS 1 TTL 3600 DLEN 4 DATA (1)f[C02A](12)root-servers(3)net(0) Offset = 0x005c, RR count = 3 ... |
The 2016 internal DNS server now replies to the client with a fail message. As expected.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
4/25/2017 8:56:35 PM 0FC0 PACKET 000001FF8DCA2D10 UDP Snd 10.10.1.2 0002 R Q [8281 DR SERVFAIL] A (8)rakhesh2(9)rockylabs(4)zero(0) UDP response info at 000001FF8DCA2D10 Socket = 668 Remote addr 10.10.1.2, port 63329 Time Query=2087079, Queued=2087079, Expire=2087082 Buf length = 0x0200 (512) Msg length = 0x0029 (41) Message: XID 0x0002 Flags 0x8182 QR 1 (RESPONSE) OPCODE 0 (QUERY) AA 0 TC 0 RD 1 RA 1 Z 0 CD 0 AD 0 RCODE 2 (SERVFAIL) QCOUNT 1 ACOUNT 0 NSCOUNT 0 ARCOUNT 0 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(8)rakhesh2(9)rockylabs(4)zero(0)" QTYPE A (1) QCLASS 1 ANSWER SECTION: empty AUTHORITY SECTION: empty ADDITIONAL SECTION: empty |
So now we know why the 2016 server is failing.
Until this is fixed one workaround would be to create a CNAME record directly in the internal DNS server to whatever the external DNS server points to. That is, don’t delegate to external – just create the same record internally too. Only for CNAME records; A is fine. Here’s an example of it working with a record called rakhesh3.rockylabs.zero
where I simply made a CNAME on the internal 2016 DNS server.
That’s all for now!
Update: This bug seems to be fixed now. I didn’t encounter it in some of my recent Server 2016 installs. I went through the update history from March 2018 but didn’t find any mention of this getting fixed. The closest DNS fix was this one for large queries. (I chose March 2018 randomly as I noticed the issue was fixed in July 2018 so it’s likely the issue was fixed much before). Update you Server 2016 to the latest patches and you should be fine!