Cannot ping an address but nslookup works (contd)

Earlier today I had blogged about nslookup working but ping and other methods not resolving names to IP addresses. That problem started again, later in the evening.

Today morning though as a precaution I had enabled the DNS Client logs on my computer. (To do that open Event Viewer with an admin account, go down to Applications and Services Logs > Microsoft > Windows > DNS Client Events > Operational – and click “Enable log” in the “Actions” pane on the right).

That showed me an error along the following lines:

A name not found error was returned for the name vcenter01.rakhesh.local. Check to ensure that the name is correct. The response was sent by the server at 10.50.1.21:53.

Interesting. So it looked like a particular DC was the culprit. Most likely when I restarted the DNS Client service it just chose a different DC and the problem temporarily went away. And sure enough nslookup for this record against this DNS server returned no answers.

I fired up DNS Manager and looked at this server. It seemed quite outdated with many missing records. This is my simulated branch office DC so I don’t always keep it on/ online. Looks like that was coming back to bite me now.

The DNS logs in Event Manager on that server had errors like this:

The DNS server was unable to complete directory service enumeration of zone TrustAnchors.  This DNS server is configured to use information obtained from Active Directory for this zone and is unable to load the zone without it.  Check that the Active Directory is functioning properly and repeat enumeration of the zone. The extended error debug information (which may be empty) is “”. The event data contains the error.

So Active Directory is the culprit (not surprising as these zones are AD integrated so the fact that they weren’t up to date indicated AD issues to me). I ran repadmin /showrepl and that had many errors:

Naming Context: CN=Configuration,DC=rakhesh,DC=local

Source: COCHIN\WIN-DC03
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=DomainDnsZones,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC03
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=ForestDnsZones,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC03
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=rakhesh,DC=local
Source: COCHIN\WIN-DC03
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: CN=Configuration,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC01
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=DomainDnsZones,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC01
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=ForestDnsZones,DC=rakhesh,DC=local
Source: COCHIN\WIN-DC01
******* WARNING: KCC could not add this REPLICA LINK due to error.

Naming Context: DC=rakhesh,DC=local
Source: COCHIN\WIN-DC01
******* WARNING: KCC could not add this REPLICA LINK due to error.

Great! I fired up AD Sites and Services and the links seemed ok. Moreover I could ping the DCs from each other. Event Logs on the problem DC (WIN-DC02) had many entries like this though:

The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server win-dc01$. The target name used was Rpcss/WIN-DC01. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server name is not fully qualified, and the target domain (RAKHESH.LOCAL) is different from the client domain (RAKHESH.LOCAL), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.

Hmm, secure channel issues? I tried resetting it but that too failed:

(Ignore the above though. Later I realized that this was because I wasn’t running command prompt as an admin. Because of UAC even though I was logged in as admin I should have right clicked and ran command prompt as admin).

Since I know my environment it looks likely to be a case of this DC losing trust with other DCs. The KRB_AP_ERR_MODIFIED error also seems to be related to Windows Server 2003 and Windows Server 2012 R2 but mine wasn’t a Windows Server 2003. That blog post confirmed my suspicions that this was password related.

Time to check the last password set attribute for this DC object on all my other DCs. Time to run repadmin /showobjmeta.

The above output gives metadata of the WIN-DC02 object on WIN-DC01. I am interested in the pwdLastSet attribute and its timestamp.  Here’s a comparison of this across my three DCs:

That confirms the problem. WIN-DC02 thinks its password last changed on 9th May whereas WIN-DC01 changed it on 25th July and replicated it to WIN-DC03.

Interestingly that date of 25th July is when I first started having problems in my test lab. I thought I had sorted them but apparently they were only lurking beneath. The solution here is to reset the WIN-DC02 password on itself and WIN-DC01 and replicate it across. The steps are in this KB article, here’s what I did:

  1. On WIN-DC02 (the problem DC) I stopped the KDC service and set it to start Manual.
  2. Purge the Kerberos ticket cache. You can view the ticket cache by typing the command: klist.  To purge, do: klist purge.
  3. Open a command prompt as administrator (i.e. right click and do a “Run as administrator”) then type the following command: netdom resetpwd /server WIN-DC01.rakhesh.local /UserD MyAdminAccount /PasswordD *
  4. Restart WIN-DC02.
  5. After logon start the KDC service and set it to Automatic.

Checked the Event Logs to see if there are any errors. None initially but after I forced a sync via repadmin /syncall /e I got a few. All of them had the following as an error:

2148074274 The target principal name is incorrect.

Odd. But at least it was different from the previous errors and we seemed to be making progress.

After a bit of trial and error I noticed that whenever the KDC service on the DC was stopped it seemed to work fine.

I could access other servers (file shares), connect to them via DNS Manager, etc. But start KDC and all these would break with errors indicating the target name was wrong or that “a security package specific error occurred”. Eventually I left KDC stay off, let the domain sync via repadmin /syncall, and waited a fair amount of time (about 15-20 mins) for things to settle. I kept an eye on repadmin /replsummary to see the deltas between WIN-DC02 and the rest, and also kept an eye on the DNS zones to see if WIN-DC02 was picking up newer entries from the others. Once these two looked positive, I started KDC. And finally things were working!