Contact

Subscribe via Email

Subscribe via RSS

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Create multiple DNS records using PowerShell

I had to create multiple A DNS records of the format below –

Just 9 records, I could create them manually, but I thought let’s try and create en-mass using PowerShell. If you are on Windows Server 2012 and above, you have PowerShell cmdlets for DNS.

So I did the following –

To confirm they are created, the following helps –

Nice!

Windows DNS server subnet prioritization and round-robin

Consider the following multiple A records for a DNS record proxy.mydomain.com:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

These records are defined on a DNS server. When a client queries the DNS server for the address to proxy.mydomain.com, the DNS server returns all the addresses above. However, the order of answers returned keeps varying. The first client asking for answers could get them in the following order for instance:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

The second client could get them in the following order:

  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5
  • proxy.mydomain.com IN A 192.168.10.5

The third client could get:

  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5
  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5

This is called round-robin. Basically it rotates between the various IP addresses. All IP addresses are offered as answers, but the order is rotated so that as long as clients choose the first answer in the list every client chooses a different IP address.

Notice I said clients choose the first answer in the list. This needn’t always be the case though. When I said clients above, I meant the client computer that is querying the DNS server for an answer. But that’s not really who’s querying the server. Instead, an application on the client (e.g. Chrome, Internet Explorer) or the client OS itself is the one looking for an answer. These ask the DNS resolver which is usually a part of the OS for an answer, and it’s the resolver that actually queries the server and gets the list of answers above.

The DNS resolver can then return the list as it is to the requesting application, or it can apply a re-ordering of its own. For instance, if the client is from the 192.168.10.0 network, the resolver may re-order the answers such that the 192.168.10.5 answer is always first. This is called Subnet prioritization. Basically, the resolver prioritizes answers that are from the same subnet as the client. The idea being that client applications would prefer reaching out to a server in their same subnet (it’s closer to them, no need to go over the WAN link for instance) than one on a different subnet.

Subnet prioritization can be disabled on the resolver side by adding a registry key PrioritizeRecordData (link) with value 0 (REG_DWORD) at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DnsCache\Parameters. By default this key does not exist and has a default value of 1 (subnet prioritization enabled).

Subnet prioritization can also be set on the server side so it orders the responses based on the client network. This is controlled by the registry key LocalNetPriority (link) under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters\ on the DNS server. By default this is 0, so the server doesn’t do any subnet prioritization. Change this to 1 and the server will order its responses according to the client subnet.

By default the server also does round-robin for the results it returns. This can be turned off via the DNS Management tool (under server properties > advanced tab). If round-robin is off the server returns records in the order they were added.

More on subnet prioritization at this link.

That’s is not the end though. :)

Consider a server who has round-robin and subnet prioritization enabled. Now consider the DNS records from above:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

The first and last records are from class C networks. The other three are from Class A networks. In reality though thanks to CIDR these are all class C addresses.

Now say there’s a client with IP address 10.136.50.2/24 asking the server for answers. On the face of it the client network does not match any of the answer record networks so the server will simply return answers as per round-robin, without any re-ordering. But in reality though the client 10.136.50.2/24 is in the same network as 10.136.52.5/24 and both are part of a larger 10.136.48.0/20 network that’s simply been broken into multiple /24 networks (to denote clients, servers, etc). What can we do so the server correctly identifies the proxy record for this client?

This is where the LocalNetPriorityNetMask registry key under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters\ on the DNS server comes into play. This key – which does not exist by default – tells the server what subnet mask to assume when it’s trying to subnet prioritize. By default the server assumes a /24 subnet, but by tweaking this key we can tell the server to use a different subnet in its calculations and thus correctly return an answer.

The LocalNetPriorityNetMask key takes a REG_DWORD value in a hex format. Check out this KB article for more info, but a quick run through:

A netmask can be written as xxx.xxx.xxx.xxx. 4 pairs of numbers. The LocalNetPriorityNetMask key is of format 0xaabbccdd – again, 4 pairs of hex numbers. This is a mask that’s applied on the mask of 255.255.255.255 so to calculate this number you subtract the mask you want from 255.255.255.255 and convert the resulting numbers into hex.

For example: you want a /8 netmask. That is 255.0.0.0. Subtracting this from 255.255.255.255 leaves you with 0.255.255.255.255. What’s that in hex? 00ffffff. So LocalNetPriorityNetMask will be 0x00ffffff. Easy?

So in the example above I want a /20 netmask. That is, I am telling the server to assume the clients and the record IPs it has to be in a /20 network, so subnet prioritize accordingly. A /20 netmask is 255.255.240.0. Subtract from 255.255.255.255 to get 0.0.15.255. Which in hex is 00000fff (15 decimal is F hex). So all I have to do is put this value as LocalNetPriorityNetMask on the DNS server, restart the service, and now the server will correctly return subnet prioritized answers for my /20 network.

Update: Some more links as I did some more reading on this topic later.

  • Ace Fekay’s post – a must read!
  • A subnet calculator (also gives you the wildcard, which you can use for calculating the LocalNetPriorityNetMask key)
  • I am not very clear on what happens if you disable RoundRobin but there are multiple entries from the same subnet. What order are they returned in? Here’s a link to the RoundRobin setting, doesn’t explain much but just linking it in case it helps in the future.
  • More as a note to myself (and any others if they wonder the same) – the subnet mask you specify is applied on the client. That is to say if you client has an IP address of say 10.136.20.10, by default the DNS server will assume a subnet mask of /24 (Class C is the default) and assume the client is in a 10.136.20.0/24 network. So any records from that range are prioritized. If you want to include other records, you specify a larger subnet mask. Thus, for example, if you specify a /20 then the client is assumed to have an IP address 10.136.20.10/20, so its network range is considered to be 10.136.16.1 – 10.136.31.254 (don’t wrack your brain – use the subnet calculator for this). So any record in this range is prioritized over records not in this range.
  • The Windows calculator can be used to find the LocalNetPriorityNetMask key value. Say you want a subnet mask of /19. The subnet calculator will tell you this has a wildcard of 0.0.31.255 – i.e. 00011111.11111111. Put this (13 1’s) into the Windows calculator to get the hex value 3FFF.  

The dig command

Just a quick post on how to use the dig command. Had to troubleshooting some DNS issues today and nslookup just wasn’t cutting it so I downloaded BIND for Windows and ran dig from there.  (For anyone else that’s interested – if you download BIND (it’s a zip file), extract the contents, and copy dig.exe and all the .dll files elsewhere that’s all you need to run dig).

Here’s the simple way of running dig:

A bit confusing when you come from nslookup where you type the name server first and then the name and type. Here’s the nslookup syntax for reference:

Although all documentation shows the dig syntax as above I think the order can be changed too. Since the name server is identified by the @ sign you can specify it later on too:

I will use this syntax everywhere as it’s easier for me to remember.

You can complicate the simple syntax by adding options to dig. There’s two sort of options: query options and domain options (I made the names up). These are usually placed after the type but again the order doesn’t really matter:

  • Query options start with a - (dash) and look like the switches you have in most command line tools. These are useful to control the behavior of dig when making the query.
    • For instance: say I want to for dig to only use IPv4. I can add the option -4. Similarly for only IPv6 it is -6.
    • If I want to specify the port of the name server (i.e. apart from 53) the -p option is for that.
    • (Very useful) If I want to specify a different source port for the client (i.e. the port from which the outgoing request is made and to which the reply is sent, in case you want to eliminate any issues with that) the -b switch is for that. It takes as argument address#port, if you don’t care about the address leave it as 0.0.0.0#port.
    • Apart from specifying the name and type as above it is also possible to pass these as query options via -n and -t.
    • There are some more query options. Above are just the common ones I can remember.
  • Domain options start with a + (plus). These pertain to the name lookup you are doing.
    • For instance: typically DNS queries are performed using UDP. To use TCP instead do +tcp. Most of these options can be prefixed with a “no”, so if you explicitly want to not use TCP then you can do +notcp instead.
    • To specify the default domain name do +domain=xxx.
    • To turn off or on recursive mode do +norecurse or +recurse. (This is similar to the -recurse or -norecurse switch in nslookup).
      • Note: By default recursion is on.
      • A quick note on recursion: When recursion is on dig asks the name server you supply (or that used by the OS) for an answer and if that name server doesn’t have an answer the name server is expected ask other name servers and get back with an answer. Hence the term “recursive”. The name server will send a recursive query to the name server it knows of, who will send a recursive query to the name server it knows of, until an answer is got down the chain of delegation. 
      • A quick note on iterative (non-recursive queries): When recursion is off dig gets an answer from the first name server, then it queries the second name server, gets an answer and queries the third name server, and so on … until dig itself gets an answer from a server who knows.
    • The output from dig is usually verbose and shows all the sections. To keep it short use the +short option.
      • This doesn’t show the name of the server that dig got an answer from. To print this too use the +identify option.
    • To list all the authoritative name servers of a domain along with their SOA records use the +nssearch option. This disables recursion.
    • To list a trace of the delegation path use the +trace option. This too disables recursion. The delegation path is listed by dig contacting each name server down the chain iteratively.
    • Couple of output modifying options. Most of these require no changing:
      • By default the first line of the output shows the version of  dig, the query, and some extra bits of info. To hide that use +nocmd.
      • By default the answer section of replies is shown. To hide it do +noanswer.
      • By default the additional sections of a reply are shown. To hide it do +noadditional.
      • By default the authority section of a reply is shown. To hide it do +noauthority.
      • By default the question asked is shown as a comment (this is useful because you might type something like dig rakhesh.com which dig interprets as dig rakhesh.com A so this makes the question explicit). To hide this do +noquestion.
        • Note that this is not same as the query that is sent. The query is the question plus other flags like whether we want a recursive query etc. By default the query that is sent is not shown. To show that do +qr.
      • By default comments are printed. To hide it do +nocomments.
      • By default some stats are printed. To hide these do +nostats.
    • To change the timeout use +time=X. Default is 5 seconds. This is equivalent to nslookup -timeout=X.
    • To change the number of retries use +retry=X. Default is 3. This is equivalent to nslookup -retry=X.
    • There are some more domain options. Above are just the common ones I can remember.

An interesting thing about dig (since BIND 9) is that it can do multiple queries (the brackets below are not part of the command, I put them to show the multiple queries):

Each of these queries can take its own options but in addition to that you can also provide global options. So the full command when you have multiple queries and global options is as below (again, without the brackets):

To add to the fun you can even have one name server to be used for all queries and over-ride these via name servers for each query! Thus a fuller syntax is:

This is also why the options can be at the beginning of the query rather than at the end of the query. When you specify global options you can over-ride these per query (except the +[no]cmd option).

Exporting and Importing Windows DNS zones

Exporting a DNS zone is easy. Use the dnscmd.

Importing too is easy but the commands aren’t so obvious. Again you use dnscmd, with the /zoneadd switch as though you are creating a new zone. The help page for this misses out on an important switch though – /load – which lets you load the zone from an exported or pre-existing file.

You can find this switch in the dnscmd help:

So the way to import a zone is as follows: first, copy the exported file into the c:\windows\system32\dns folder of the DNS server and preferably rename it so the extension is a .dns (not required, just a nice thing to do). Then run a command similar to below:

That’s it. This will create a primary zone called “blah.com” and use the zone file that’s already in the location.

Note that you can’t use this technique for AD integrated zones. But that’s no issue. Simply import as above and then convert the zone to AD integrated via the GUI.

A problem occurred while trying to add the conditional forwarder

I was trying to create a conditional forwarder on my work DNS servers and kept hitting this cryptic error message:

forwarder error

I was trying to create a conditional forwarder called some.sub.zone.com.  At first I thought maybe I had this as an existing zone or perhaps as a stub zone – but nope, I don’t have that. Some forum posts mentioned the lack of root hints could lead to this error, but that doesn’t make sense to me – why would I need root hints for this? Next I created a test conditional forwarder to some random domain name and that worked – so surely it wasn’t a server issue.

I recreated this in my test lab and found the problem. The issue is that I am trying to create a forwarder to some.sub.zone.com while zone.com already exists on the DNS server. I was under the impression you could have conditional forwarders even for zones you host, but nope that’s a no can do. From the official docs here’s a para of interest:

A DNS server cannot forward queries for the domain names in the zones it hosts. For example, the authoritative DNS server for the zone microsoft.com cannot forward queries according to the domain name microsoft.com. The DNS server authoritative for microsoft.com can forward queries for DNS names that end with example.microsoft.com, if example.microsoft.com is delegated to another DNS server.

The emphasis is mine and that’s the work-around to use here. You have two options – either delete the zone.com zone from your DNS servers and then create a conditional forwarder for some.sub.zone.com; or create a delegation for some.sub.zone.com – you could do that to yourself too – and then create the conditional forwarder.

Here’s a screenshot from my test lab –

delegationThe some.sub delegation is to my server itself. You don’t need to create a zone for the delegation to succeed. The delegation is just a one way pointer of sorts telling the server to ask the delegated server for any queries concerning this sub zone – it basically tells the server hosting zone.com that it is no longer responsible for some.sub.zone.com (even though the delegation points back to itself!). Once that is done the server will allow you to create a conditional forwarder for some.sub.zone.com.

Pause a DNS zone on all DNS servers

Here’s how I paused a zone on all the DNS servers hosting that zone:

This looks up the name servers for the zone and suspends the zone on each of those servers. If there are any servers that host this zone but aren’t specified as name servers for the zone (for example it could be an AD integrated zone but the NS records are incomplete) it misses out those servers. So it’s not a great script, there’s probably better ways to do this.

In my case the zone in question was being replicated to all DCs in the domain. So I got a list of all DCs in the domain and targeted those instead:

 

Stub zones do not need zone transfer (with screenshots!)

I had to write an email about this and so take the trouble to set up a test zone and create screenshots. Figured I might as well put it in a blog post too.

Exhibit A: An AD integrated zone called some.zone.com.

somezoneDoesn’t matter that it’s AD integrated or what NS records it holds. I just created an AD integrated zone to simulate our work environment.

Note that this zone doesn’t have zone transfers enabled.

nozonetransferExhibit B: A regular Windows Server 2012 machine called WIN-SVR01. Not domain joined (just in case anyone points out that could make a difference). It has access to the master server and Name Servers and that’s it. Create a stub zone as usual, pointing it to the master servers (in the screenshot below I point to just one master server).

new stub zone

Exhibit C: And that’s it! As soon as I do the above, the zone loads and I am able to query records in it.

stub zone works

That’s it!

One source of confusion seems to be the Get-DnsServerZone cmdlet. Here’s the cmdlet output once the stub zone has loaded:

Note the attributes LastZoneTransferAttempt and LastZoneTransferResult – these give the impression a zone transfer is being carried out.

Now watch the same output after I recreated the stub zone but this time I blocked it from accessing the master servers (so the stub zone doesn’t load):

Even though the zone hasn’t loaded LastSuccessfulZoneTransfer gives the impression it has succeeded. LastZoneTransferResult gives an error code though. Best to ignore these attributes for stub zones – as I showed above stub zones don’t require a zone transfer.

Stub zones do not need zone transfer

At work there was some confusion that creating stub zones requires the master servers to allow zone transfers to the servers holding the stub zones. That’s not correct and oddly I couldn’t find any direct hits when I typed this query into Google so I could show some blog posts/ articles for support.

I mentioned this briefly in one of my earlier blog posts. But don’t take my word for it here are two blog posts and a book except mentioning the same:

If stub zones don’t require any zone transfers what’s the difference between them and conditional forwarders? Again, check the second link above but the long and short of it is that stub zones query the master server IPs you give and asks these servers for a list of NS records and their addresses, and then queries these name servers for whatever record you want; while conditional forwarders have a predefined list of name servers and so always query these for the record you want. Stub zones are more resilient to changes. If the remote end adds/ removes a name server the stub zone will automatically pick it up as long as the master servers are up-to-date and reachable. A conditional forwarder won’t do this automatically – the remote end admins will have to communicate the new name server details to the conditional forwarder admins and they will have to update at their end.

Hope that clarifies!

p.s. See my next post.

p.p.s. Hadn’t thought of this. Good point (via):

Stub zones: will use whatever is in the NS records of the zone (or descendants of the zone, if not otherwise defined) to resolve queries  which are below a zone cut.

Forward zones: will always use the configured forwarders, which must support recursion, even for names which are known to be deeper in the delegation hierarchy and whose delegated/authoritative nameservers might respond more quickly than the forwarders, if asked.

Cannot ping an address but nslookup works (contd)

Earlier today I had blogged about nslookup working but ping and other methods not resolving names to IP addresses. That problem started again, later in the evening.

Today morning though as a precaution I had enabled the DNS Client logs on my computer. (To do that open Event Viewer with an admin account, go down to Applications and Services Logs > Microsoft > Windows > DNS Client Events > Operational – and click “Enable log” in the “Actions” pane on the right).

That showed me an error along the following lines:

A name not found error was returned for the name vcenter01.rakhesh.local. Check to ensure that the name is correct. The response was sent by the server at 10.50.1.21:53.

Interesting. So it looked like a particular DC was the culprit. Most likely when I restarted the DNS Client service it just chose a different DC and the problem temporarily went away. And sure enough nslookup for this record against this DNS server returned no answers.

I fired up DNS Manager and looked at this server. It seemed quite outdated with many missing records. This is my simulated branch office DC so I don’t always keep it on/ online. Looks like that was coming back to bite me now.

The DNS logs in Event Manager on that server had errors like this:

The DNS server was unable to complete directory service enumeration of zone TrustAnchors.  This DNS server is configured to use information obtained from Active Directory for this zone and is unable to load the zone without it.  Check that the Active Directory is functioning properly and repeat enumeration of the zone. The extended error debug information (which may be empty) is “”. The event data contains the error.

So Active Directory is the culprit (not surprising as these zones are AD integrated so the fact that they weren’t up to date indicated AD issues to me). I ran repadmin /showrepl and that had many errors:

Naming Context: CN=Configuration,DC=rakhesh,DC=local

Source: COCHIN\WIN-DC03
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=DomainDnsZones,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC03
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=ForestDnsZones,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC03
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=rakhesh,DC=local
Source: COCHIN\WIN-DC03
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: CN=Configuration,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC01
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=DomainDnsZones,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC01
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=ForestDnsZones,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC01
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=rakhesh,DC=local
Source: COCHIN\WIN-DC01
******* WARNING: KCC could not add this REPLICA LINK due to error.

Great! I fired up AD Sites and Services and the links seemed ok. Moreover I could ping the DCs from each other. Event Logs on the problem DC (WIN-DC02) had many entries like this though:

The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server win-dc01$. The target name used was Rpcss/WIN-DC01. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server name is not fully qualified, and the target domain (RAKHESH.LOCAL) is different from the client domain (RAKHESH.LOCAL), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.

Hmm, secure channel issues? I tried resetting it but that too failed:

(Ignore the above though. Later I realized that this was because I wasn’t running command prompt as an admin. Because of UAC even though I was logged in as admin I should have right clicked and ran command prompt as admin).

Since I know my environment it looks likely to be a case of this DC losing trust with other DCs. The KRB_AP_ERR_MODIFIED error also seems to be related to Windows Server 2003 and Windows Server 2012 R2 but mine wasn’t a Windows Server 2003. That blog post confirmed my suspicions that this was password related.

Time to check the last password set attribute for this DC object on all my other DCs. Time to run repadmin /showobjmeta.

The above output gives metadata of the WIN-DC02 object on WIN-DC01. I am interested in the pwdLastSet attribute and its timestamp.  Here’s a comparison of this across my three DCs:

That confirms the problem. WIN-DC02 thinks its password last changed on 9th May whereas WIN-DC01 changed it on 25th July and replicated it to WIN-DC03.

Interestingly that date of 25th July is when I first started having problems in my test lab. I thought I had sorted them but apparently they were only lurking beneath. The solution here is to reset the WIN-DC02 password on itself and WIN-DC01 and replicate it across. The steps are in this KB article, here’s what I did:

  1. On WIN-DC02 (the problem DC) I stopped the KDC service and set it to start Manual.
  2. Purge the Kerberos ticket cache. You can view the ticket cache by typing the command: klist.  To purge, do: klist purge.
  3. Open a command prompt as administrator (i.e. right click and do a “Run as administrator”) then type the following command: netdom resetpwd /server WIN-DC01.rakhesh.local /UserD MyAdminAccount /PasswordD *
  4. Restart WIN-DC02.
  5. After logon start the KDC service and set it to Automatic.

Checked the Event Logs to see if there are any errors. None initially but after I forced a sync via repadmin /syncall /e I got a few. All of them had the following as an error:

2148074274 The target principal name is incorrect.

Odd. But at least it was different from the previous errors and we seemed to be making progress.

After a bit of trial and error I noticed that whenever the KDC service on the DC was stopped it seemed to work fine.

I could access other servers (file shares), connect to them via DNS Manager, etc. But start KDC and all these would break with errors indicating the target name was wrong or that “a security package specific error occurred”. Eventually I left KDC stay off, let the domain sync via repadmin /syncall, and waited a fair amount of time (about 15-20 mins) for things to settle. I kept an eye on repadmin /replsummary to see the deltas between WIN-DC02 and the rest, and also kept an eye on the DNS zones to see if WIN-DC02 was picking up newer entries from the others. Once these two looked positive, I started KDC. And finally things were working!

 

Cannot ping an address but nslookup works

Had an odd thing happen yesterday. I could nslookup names from my computer but couldn’t ping them. Ping would give an error as below:

Very odd.

This is not just ping, nothing in my system could connect to these addresses.

Nslookup works by querying the DNS servers directly. Ping and the OS work by telling the Windows DNS Resolver to query the DNS servers. Both should be querying the same servers (unless I specifically tell nslookup to check with someone else) so in theory both should work the same – but oddly it doesn’t.

I restarted the DNS Client service (called dnscache) and then everything began working as expected. Not sure what was really wrong …

DNS request timed out

Whenever I’d do an nslookup I noticed these timeout messages as below:

Everything seemed to be working but I was curious about the timeouts.

Then I realized the problem was that I had missed the root domain at the end of the name. You see, a name such as “www.msftncsi.com” isn’t an absolute name. For all the DNS resolver knows it could just be a hostname – like say “eu.mail” in a full name such as “eu.mail.somedomain.com”. To tell the resolver this is the full name one must terminate it with a dot like thus: “www.msftncsi.com.“. When we omit the dot in common practice the DNS resolver puts it in implicitly and so we don’t notice it usually.

The dot is what tells the resolver about the root of the domain name hierarchy. A name such as “rakhesh.com.” actually means the “rakhesh” domain in the “com” domain in the “.” domain. It is “.” that knows of all the sub-domains such as “com”, “io”, “net”, “pl”, etc.

In the case above since I had omitted the dot my resolver was trying to append my DNS search suffixes to the name “www.msftncsi.com” to come up with a complete name. I have search suffixes “rakhesh.local.” and “dyn.rakhesh.local.” (I didn’t put the dot while specifying the search suffixes but the resolver puts it in because I am telling it these are the absolute domain names) so the resolver was actually expanding “www.msftncsi.com” to “www.msftncsi.com.rakhesh.local.” and “www.msftncsi.com.dyn.rakhesh.local.” and searching for these. That fails and so I get these “DNS request timed out” messages.

If I re-try with the proper name the query goes through fine:

Just to eliminate any questions of whether a larger timeout is the solution, no it doesn’t help:

If I replace my existing DNS suffixes search list with the dot domain, that too helps (not practical because then I can’t query by just the hostname, I will always have to put in the fully qualified name):

Or I could tell nslookup to not use the DNS suffix search lists at all (a more practical solution):

So there you go. That’s why I was getting those timeout errors.

Updating Windows DNS Server root hints

Somehow I came upon the root hints of my Windows DNS Server today and had a thought to update it. Never done that before so why not give it a shot?

You can find the root hints by right clicking on the server and going to the ‘Root Hints’ tab.

root hints

Or you could click the server name in DNS Manager and select ‘Root Hints’ in the right pane. Either ways you get to the screen above. From here you can add/ remove/ edit root server names and IP addresses. If you want to update this list you can do so by each entry, or click the ‘Copy from Server’ button to update the list with a new bunch of entries. Note that ‘Copy from Server’ does not over-write the list, so you are better off removing all the entries first and then doing ‘Copy from Server’.

The ‘Copy from Server’ option had me stumped though. You can find the root hints on the IANA website – there’s an FTP link to the file containing root hints, as well as an HTTP link (http://www.internic.net/domain/named.root). I thought simply entering this in the ‘Copy from Server’ window should suffice but it doesn’t. Notice the OK  button is grayed out.

copy from serverThe window says it wants a server name or IP address so I removed everything above except the server name and then clicked OK. That looked like it was doing something but then failed with a message that it couldn’t get the root hints. The message said the specified DNS server could not be contacted so that gave me the idea it was looking for a DNS server which had the root hints.

searching for root hintssearch failsSo I tried inputting the name of one of my DNS servers. This DNS server knows of the root servers because it has them already. (You can verify that a server knows of the root hints via nslookup as below).

My DNS server doesn’t have an authoritative answer (notice the output above) because it only has the info that’s present with it by default. The real answers could have changed by now (and it often does – the root hints list that these servers come with can have outdated entries) but that’s fine because it has some answers at least. If I were to input this server’s name or IP address to the ‘Copy from Server’ dialog above, that DNS server gets the root hints from this DNS server and updates itself.

Even better though would be to put the IP address of one of the root servers returned above. Like a.root-servers.net which has an IPv4 address of 198.41.0.4. (Don’t go by the output above, you can get the latest IP addresses from IANA). If I query that address for the root servers it has an authoritative answer:

So there you have it. I put in this IPv4 address into the ‘Copy from Server’ window and my server updated itself with the IP addresses. I noticed that it had missed some of the IPv6 addresses (not sure why, maybe coz it can’t validate these?) but when I did a ‘Copy from Server’ again without removing any existing entries and input in the same IPv4 address and did an update, this time it picked up all the addresses.

(Note to self: The %WINDIR%\System32\dns\cache.dns file seems to contain root hints. I replaced this file with the root hints from IANA but that did not update the server. When I updated the server as above and checked this file it hadn’t changed either. Restarting the DNS service didn’t update the file/ root hints either, so am not sure how this file comes into play).

 

Fixing “The DNS server was unable to open Active Directory” errors

For no apparent reason my home testlab went wonky today! Not entirely surprising. The DCs in there are not always on/ connected; and I keep hibernating the entire lab as it runs off my laptop so there’s bound to be errors lurking behind the scenes.

Anyways, after a reboot my main DC was acting weird. For one it took a long time to start up – indicating DNS issues, but that shouldn’t be the case as I had another DC/ DNS server running – and after boot up DNS refused to work. Gave the above error message. The Event Logs were filled with two errors:

  • Event ID 4000: The DNS server was unable to open Active Directory.  This DNS server is configured to obtain and use information from the directory for this zone and is unable to load the zone without it. Check that the Active Directory is functioning properly and reload the zone. The event data is the error code.
  • Event id 4007: The DNS server was unable to open zone <zone> in the Active Directory from the application directory partition <partition name>. This DNS server is configured to obtain and use information from the directory for this zone and is unable to load the zone without it. Check that the Active Directory is functioning properly and reload the zone. The event data is the error code.

A quick Google search brought up this Microsoft KB. Looks like the DC has either lost its secure channel with the PDC, or it holds all the FSMO roles and is pointing to itself as a DNS server. Either of these could be the culprit in my case as this DC indeed had all the FSMO roles (and hence was also the PDC), and so maybe it lost trust with itself? Pretty bad state to be in, having no trust in oneself … ;-)

The KB article is worth reading for possible resolutions. In my case since I suspected DNS issues in the first place, and the slow loading usually indicates the server is looking to itself for DNS, I checked that out and sure enough it was pointing to itself as the first nameserver. So I changed the order, gave the DC a reboot, and all was well!

In case the DC had lost trust with itself the solution (according to the KB article) was to reset the DC password. Not sure how that would reset trust, but apparently it does. This involves using the netdom command which is installed on Server 2008 and up (as well as on Windows 8 or if RSAT is installed and can be downloaded for 2003 from the Support Tools package). The command has to be run on the computer whose password you want to reset (so you must login with an account whose initials are cached, or use a local account). Then run the command thus:

Of course the computer must have access to the PDC. And if you are running it on a DC the KDC service must be stopped first.

I have used netdom in the past to reset my testlab computer passwords. Since a lot of the machines are usually offline for many days, and after a while AD changes the computer account password but the machine still has the old password, when I later boot up the machine it usually gives are error like this: “The trust relationship between this workstation and the primary domain failed.”

A common suggestion for such messages is to dis-join the machine from the domain and re-join it, effectively getting it a new password. That’s a PITA though – I just use netdom and reset the password as above. :)

 

DNS zone and domain

Once upon a time I used to play with DNS zones & domains for breakfast, but it’s been a while and I find myself to be a bit rusty.

Anyways, something I realized / remembered today – a DNS domain is not equal to a DNS zone. When creating a DNS domain under Windows, using the GUI, it is easy to equate the domain to the zone; but if you come from a *nix background then you know a zone is the zone file whereas domains are different from that.

For example here’s a domain called “domain.com” and its sub-domain “sub.domain.com”.

domainYou would think there wouldn’t be much difference between the two but the fact is that “domain.com” is also the zone here and “sub.domain.com” is a part of that zone. The domain “sub.domain.com” is not independent of the main domain “domain.com”. It can’t have its own name servers. And when it comes to zone transfers “sub.domain.com” follows whatever is set for “domain.com”. You can’t, for instance, have “domain.com” be denied zone transfers while allowing zone transfers for “sub.domain.com” – it’s simply not possible, and if you think about it that makes sense too because after all “sub.domain.com” doesn’t have its own name servers.

In this case the zone “domain.com” consists of both the domain “domain.com” and its sub-domain “sub.domain.com”.

In contrast below is an example where there are two zones, one for “domain.com” and another for “sub.domain.com”. Both domain and sub-domain have their own zones (and hence name servers) in this case.

subdomainWhen creating a new domain / zone the GUI makes this clear too but it’s easy to miss out the distinction.

New domain

New domain

New Zone

New Zone

Stub zones

We use stub zones at work and initially I had a domain “sub.domain.com” which I wanted to create a a stub zone on another server. That failed with an error that the zone transfer failed.

transferInitially I took this to mean the stub zone was failing because the zone wasn’t getting transferred from the main server. That was correct – sort of.  Because “sub.domain.com” isn’t a zone of its own, it doesn’t have any name servers. And the way stub zones work is that the stub server contacts the name servers of “sub.domain.com” to get a list of name servers for the stub zone but that fails because “sub.domain.com” doesn’t have any name servers! It is not a zone, and only zones have name servers, not (sub-)domains.

So the error message was misleading. Yes, the zone transfer failed, but that’s not because the transfer failed but because there were no servers with the “sub.domain.com” zone. What I have to do is convert “sub.domain.com” to a zone of its own. (Create a zone called “sub.domain.com”, create new records in that zone, then delete the “sub.domain.com” domain).

Worth noting: Stub zones don’t need zone transfers allowed. Stub zones work via the stub server contacting the name servers of the stub zone and asking for a list of NS, A, and SOA records. These are available without any zone transfer required.

In our case we wanted to create a stub host record. We had an A record “host.sub.domain.com” and wanted to create a stub to that from another server. The solution is very simple – create a new zone called “host.sub.domain.com”, create a blank A record in that with the IP address you want (same IP that was in the “host.sub.domain.com” A record), then delete the previous “host.sub.domain.com” A record. 

Now create a stub zone for that record:stubzoneAnd that’s it.

Just to recap: zones contain domains. A domain can be spread (as sub-domains) among multiple zones. For zone transfers and stub zones you need the domain in question to be in a zone of its own. 

Notes on Dynamic DNS updates, DNS Scavenging, etc.

Dynamic DNS updates can be set to one of these (per zone):

  • None => No dynamic updates are allowed for the zone on this server
  • Secure => Only secure updates are allowed.
    • Note: This is only applicable to AD integrated zones.
    • By default only domain members (domain joined computers & domain users) are allowed to update the zone for secure updates. This is controlled by the ACLs on the zone (which can be viewed via the Security tab of the zone – check out the ACE for “Authenticated Users“). See this link for more
  • Nonsecure and secure => Both secure and nonsecure updates are allowed.

Scavenging

Dynamic DNS updates result in records being added and deleted to DNS. But while records are correctly added, it is not always the case that the record is also correctly removed. For instance, a client could have got an IP address from DHCP and dynamically registered its A record. Maybe the client then crashed so it never removed the dynamically registered record. The address, however, is removed from DHCP after the lease expires and could later be assigned to another client – who also dynamically registers itself in DNS – resulting in two A records, both to the same IP address, but one of them incorrect. To prevent such issues DNS scavenging is required. This removes stale DNS records after a pre-defined period. 

  • Scavenging is set at 3 places on a Windows server and all three must coincide for a record to be scavenged. These places are:
    • an individual record; 
    • the zone; and
    • the server performing the scavenging. 
  • The scavenging setting on an individual record can be viewed only after selecting View > Advanced in the DNS MMC and then viewing the properties of a record. 
    • When a dynamic DNS record is created it has a timestamp (rounded down to the nearest hour when the record was created).
      • When a record is first created it is considered an “Update”.
      • When an existing record is updated with the same IP address it is considered a “Refresh”.
      • When an existing record is updated with a new IP address it is considered an “Update”.
    • Every 24 hours Windows clients will attempt to to dynamically update the DNS record. The update could be considered an update or a refresh depending on whether the IP changes as above.
    • If a record is enabled for scavenging its properties window will have a tick next to “Delete this record when it becomes stale”. 
      • Static records don’t have this ticked by default (because they are not meant to be scavenged). 
      • If this is manually ticked (for a static or dynamic record) then a timestamp will be set to when it was ticked (rounded down to the nearest hour). 
    • It is possible to set scavenging on a zone and all its records via the following command: dnscmd /ageallrecords
      • This is not recommended as it enables scavenging on all records – even static. Do not use this command on zones with static records. 
  • The scavenging setting for a zone can be viewed via the Aging button in the zone properties (by default the setting is off).
    • The aging values for a zone are replicated to all DNS servers hosting the zone.
    • Two intervals are in play here:
      • No-refresh interval: Once a record is refreshed, it is not refreshed again until this time period has passed. 
        • The purpose of this seems to be to reduce replication traffic. If a client refreshes its DNS record every 24 hours, those are ignored by the DNS server for the no-refresh interval, and not replicated to other DNS servers.  
      • Refresh interval: How much time to wait once a record is refreshed before it can be scavenged?
        • So this interval specifies how long the server should wait after a record has refreshed before it can considered it a candidate for scavenging.
        • The default value is 7 days. This means, if a record is refreshed today, the server will wait for 7 days to see if it’s refreshed again. If it is not, the server considers this record ready for scavenging. 
    • Both intervals must be passed for a record to be expired. By default both are 7 days, so what this means is:
      • If a record is created/ updated/ refreshed today, for the next 7 days the record is considered current – irrespective of whether any refreshes happen or not (because remember: during the no-refresh interval refreshes, if they happen, are ignored so the server considers the record as current for this period). 
      • After those 7 days have passed, the server checks if there are any refreshes.
        • If there are, the timestamp is accordingly updated and it goes back into waiting the no-refresh interval again. 
        • If there are no refreshes, the server now waits 7 days of refresh interval to see if any refreshes happen. If they do, the record goes back into the no-refresh interval; if there aren’t any, the record is ready for scavenging. 
    • As an aside, the default lease duration for Windows server DHCP leases is 8 days. Which is why the no-refresh interval is set to 7 days by default. During these 7 days the address won’t be allocated to any other client, nor will it change with the client, so chances of a refresh are minimal. 
      • DHCP leases and Dynamic DNS updates can conflict if clients are responsible for updating DNS with their addresses (which is usually the default).
      • Say a client got an IP address from DHCP (leased to it for 8 days, remember). The client will update that in DNS. For the next 7 days any refreshes from the client are ignored (no-fresh interval, expected). From the 7th day any refreshes/ updates will be considered.
      • Say our client went offline on the 3rd day. So on day 7 it doesn’t send a refresh – no problemo, DNS will not scavenge the record yet, it will simply wait for another 7 days.
      • On the 8th day, however, DHCP will release that IP address for others to use. Any new client that comes up will now get this address. This new client will send a Dynamic DNS update to the DNS server – creating a new A record to the same address, but with this new client’s name. Thus there are two DNS entries now to the same IP address!
      • Only after the refresh interval expires (7 days) can the old record be actually scavenged by the server (and even then there could be a delay based on the server setting – see below). 
      • For this reason it is recommended that the DHCP lease duration match the “no-refresh+refresh” interval of DNS scavenging. In the default case, either increase the DHCP lease to 14 days (7+7 days) or decrease the no-refresh and refresh intervals to 4 days (so the sum is 8 days, the DHCP lease).
        • Alternatively, allow the DHCP server to make updates on behalf of clients and disable (via GPO?) clients from registering updates with the DNS server.
          • Read this post on DHCP servers and Dynamic DNS updates.
          • Typical solution is to put all DHCP servers in a group called DnsUpdateProxy, but that’s not recommended – especially for DCs – because if a server is in this group the dynamic DNS records it creates have no security (so in the case of a DC this means the SRV records written by netlogon can be changed by anyone!)
          • It is better to create a low privilege AD user and get all DHCP users to use that account to register records. This way the dynamic DNS records are secured to that user.
          • Also note: if dynamic DNS records are written by a server in the DnsUpdateProxy group – i.e. with no security – if any other machine (even one not in this group) changes this record (because the records are open to all) the ACLs of that record will be changed to only grant that machine permissions to the record. Thus the original DHCP server will lose rights to that record. DnsUpdateProxy is not a good idea. 
        • It is important that clients be disabled from registering dynamic DNS updates in this case. Else the ACLs on the DNS record created by the client will prevent updates/ deletions from the DHCP server to the DNS server for those records.  
    • When scavenging is enabled for a zone, the “Date and time” the zone can be scavenged after value is set to the time the setting was enabled (rounded down to the nearest hour) plus the refresh interval period. 
    • It is also possible to right click a DNS server and set scavenging values for all zones on that server. These only apply to zones created after this setting was changed (unless the setting to modify existing zones is explicitly selected). 
  • The scavenging setting on the server can be enabled via the “Advanced” tab of the server properties in the DNS MMC (by default the setting is off).
    • When this setting is enabled, the scavenging period is set to a default of 7 days. The scavenging period defines how often the server will try to scavenge records. 
      • Does this mean every time the server starts scavenging all records are immediately deleted? No – because you have to also consider the no-refresh and refresh intervals of above. When a server runs its scavenging task, if a record to be scavenged has not crossed the refresh interval, it will not be removed. Similarly, if a record has crossed its refresh interval and is ready to be scavenged, if the server’s scavenging period isn’t due for a few more days nothing will happen. It’s only when the scavenging period is due that this record will be scavenged. 
    • When the server scavenges records it logs an event ID 2501 indicating how many records were scavenged. If no records were scavenged, an event ID 2502 will be logged. 
    • Note: You needn’t enable the scavenging setting on all servers hosting a zone. As long as any one server scavenges, the changes will propagate to others. In fact, it’s preferred to have only one server (or a set of servers) scavenge a zone as that will make it easier to troubleshoot. If all servers hosting a zone have scavenging on and the zone records are not being scavenged, we will have to check all these servers to see why scavenging isn’t happening. 
    • In practice, it is likely that all servers have scavenging turned on (because they are hosting multiple zones and could be responsible for scavenging one of those zones). But once a server has scavenging turned on it will scavenge any zones that has scavenging turned on. It is possible to restrict the servers that are allowed to scavenge a zone – even if the server and zone have scavenging turned on – via the dnscmd command. The syntax is as follows:

      The IP addresses are optional. If no address is given, all servers are allowed to scavenge it. Example:

      To see what servers have permissions to scavenge a zone the same command with a different switch can be used:

      Resetting this is simple – just don’t specify an IP addresses, that’s all:

The latter part of this blog post gives an example of how scavenging works with all the intervals above. In fact the whole blog post is worth a read.