Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

Event ID 1046 – DHCP server says it is not authorized even though it is authorized!

This problem ate my head for the past 2 days and wasted a lot of time. For such a simple issue it drove me quite mad.

Built a bunch of DCs for our branch offices. One of them gave trouble with the DHCP server. I authorized it successfully, but the service kept complaining that it wasn’t authorized. Event ID 1046.

The DHCP/BINL service on the local machine, belonging to the Windows Administrative domain mydomain.dom, has determined that it is not authorized to start.  It has stopped servicing clients.  The following are some possible reasons for this: 

This machine is part of a directory service enterprise and is not authorized in the same domain.  (See help on the DHCP Service Management Tool for additional information). 

This machine cannot reach its directory service enterprise and it has encountered another DHCP service on the network belonging to a directory service enterprise on which the local machine is not authorized. 

Some unexpected network error occurred.

Did the obvious ones like reboot server :p and restart service :) and un-authorize and re-authorize the server (no errors either time). Also went ahead and removed the role itself and added back. Nothing helped!

Found a helpful post finally that pointed me in the right direction.

  1. I un-authorized the DHCP server.
  2. Opened up AD Sites and Services. 
  3. Browsed to the Services section (which can be enabled from the View menu if not already visible). 
  4. Browsed to the NetServices section within this. 
  5. On the right pane I had an entry for the IP address for the DHCP server I was trying to authorize. Not an entry by name, but by IP. Dunno why. (All other entries were by name, so I am guessing this is a leftover or a mistake by someone in the past). 
  6. I deleted this entry. 
  7. Waited a while, and then authorized the server. 
  8. No errors now!

Screenshot of the offending entry just for the heck of it (the blacked out part was an IP address):

Alternatively one can open ADSI Edit and go to CN=NetServices,CN=Services,CN=Configuration,DC=myDomain,DC=dom. Then delete the entry (as above) from there. 

What’s odd in my case is that the IP that I deleted was assigned to the DHCP server I wanted to authorize. Am guessing the CNF (short for conflict?) following by the GUID indicates some issue.

Windows CLI – find groups you are a member of

I knew of doing a gpresult /v and finding the group membership. An even better (and faster) way is whoami /groups.

Other useful whoami switches.

[Aside] AD Sites, Subnets, Trusts, etc.

  • How Domain Controllers are Located Across Trusts – this is a delightful article. I don’t know why, but I simply loved the way the author presented the information. Very logically written. Wish I could write blog posts with such clarity.
    • Praise aside, it is a good article on how subnet and site definitions are used to find a Domain Controller closest to you, and especially how it works across forest trusts.
  • Using Catch-All subnets in AD – Wanted to know how catch-all subnets in AD Sites will interact with specific ones. This one explained it. The specific one takes precedence. Which is exactly what you want. :)

PowerShell – Find all AD users with ACL inheritance disabled

Quick one-liner to find all AD user objects with ACL inheritance disabled:

Another one:

 

ADFS errors and WID

Spent a bit of time today tracking down an ADFS/ WID issue. Turned out to be a silly one in the end (silly on my part actually, should have spotted the cause right away!) but it was a good learning exercise in the end. 

The issue was that ADFS refused to launch after a server reboot. The console gave an error that it couldn’t connect to the configuration database. The ADFS service refused to start and the event logs were filled with errors such as these:

The Federation Service configuration could not be loaded correctly from the AD FS configuration database.

Additional Data
Error:
ADMIN0012: OperationFault

There was an error in enabling endpoints of Federation Service. Fix configuration errors using PowerShell cmdlets and restart the Federation Service.

Additional Data
Exception details:
System.ServiceModel.FaultException`1[Microsoft.IdentityServer.Protocols.PolicyStore.OperationFault]: ADMIN0012: OperationFault (Fault Detail is equal to Microsoft.IdentityServer.Protocols.PolicyStore.OperationFault).

A SQL operation in the AD FS configuration database with connection string Data Source=np:\\.\pipe\microsoft##wid\tsql\query;Initial Catalog=AdfsConfiguration;Integrated Security=True failed.

Additional Data

Exception details:
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server)

The last one repeated many times. 

I hadn’t installed the ADFS server in our firm so I had no clue how it was setup. Importantly, I didn’t know it used the Windows Internal Database (WID) which you can see in the error messages above. It is possible to have ADFS work with SQL for a larger setup but that wasn’t the case here. Following some blog posts on the Internet (this and this) I downloaded SQL Server Management Studio (SSMS) and tried connecting to the WID at the path given in the error (\\.\pipe\microsoft##wid\tsql\query). That didn’t work for me – it just gave me some errors that the SQL server was unreachable. 

BTW, according to one of the blog posts it is better to launch SSMS as the user who has rights to connect to the WID database (the service account under which your ADFS service runs for instance). That didn’t help in my case (not saying the advice is incorrect, my issue was something else). Found a Microsoft blog post too that confirmed I was connecting to the correct server name – \\.\pipe\microsoft##wid\tsql\query for Windows 2012 and above; \\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query for Windows 2003 & 2008 – but no go. 

That’s when I realized the WID has its own service. I had missed this initially. Trying to start that gave an error that it couldn’t start due to a login failure. This service runs under an account NT SERVICE\MSSQL$MICROSOFT##WID and looks like it didn’t have logon as service rights. It looks like someone had played around with our GPOs (or moved this server to a different OU) and this account had lost its rights. 

The fix is simple – just give this account rights via GPO (or exclude the server from whatever GPO is fiddling with logon as a service rights; or move this server to some other OU). Since the NT SERVICE\MSSQL$MICROSOFT##WID is not a regular account, you can’t add it to GPO from any server (because the account is local and will only exist if WID is installed). So I opened GPMC on my ADFS server and modified the GPO to give this account logon as a service rights. 

[Aside] SPNs

Trying to get people at work to clean up duplicate SPNs, and came across some links while reading about this topic. 

From the official MSDN article: A service principal name (SPN) is a unique identifier of a service instance. SPNs are used by Kerberos authentication to associate a service instance with a service logon account. This allows a client application to request that the service authenticate an account even if the client does not have the account name.

Basically when a client application tries to authenticate with a service instance and the domain controller needs to issues it Kerberos tickets, the domain controller needs to know whose password to use for the service instance – is it that of the server where this instance runs, or any service account responsible for it. This mapping of service -> service account/ computer account is an SPN. It’s of the format service/host:port and is associated with the AD account of the service account or computer account (stored in the servicePrincipalName attribute actually).

That’s all!

[Aside] DFRS links

Just putting these here as bookmarks to myself.

One of our DCs at work had the following DFSR warnings in the DFS Replication logs:

The DFS Replication service stopped replication on volume C:. This occurs when a DFSR JET database is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication.

Additional Information:
Volume: C:
GUID: 56234A2C-C156-11E2-93E8-806E6F6E6111

Recovery Steps
1. Back up the files in all replicated folders on the volume. Failure to do so may result in data loss due to unexpected conflict resolution during the recovery of the replicated folders.
2. To resume the replication for this volume, use the WMI method ResumeReplication of the DfsrVolumeConfig class. For example, from an elevated command prompt, type the following command:
wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid=”56234A2C-C156-11E2-93E8-806E6F6E6111″ call ResumeReplication

For more information, see http://support.microsoft.com/kb/2663685.

Sounded like an easy fix, so I went ahead and tried resuming replication as directed. That didn’t work though. Got the following:

The DFS Replication service stopped replication on the folder with the following local path: D:\SYSVOL_DFSR\domain. This server has been disconnected from other partners for 154 days, which is longer than the time allowed by the MaxOfflineTimeInDays parameter (60). DFS Replication considers the data in this folder to be stale, and this server will not replicate the folder until this error is corrected.

To resume replication of this folder, use the DFS Management snap-in to remove this server from the replication group, and then add it back to the group. This causes the server to perform an initial synchronization task, which replaces the stale data with fresh data from other members of the replication group.

Additional Information:
Error: 9061 (The replicated folder has been offline for too long.)
Replicated Folder Name: SYSVOL Share
Replicated Folder ID: xxxxx
Replication Group Name: Domain System Volume
Replication Group ID: xxxxxx
Member ID: xxxxx

Yeah, bummer!

Check out this Microsoft blog post for content freshness and the MaxOfflineTimeInDays parameter. You can’t simple remove SYSVOL from DFSR replication groups via the GUI as it is a special folder, so you have to work around. I found some forum posts and blog posts that suggested simply raising this parameter for the broken server to a number larger than the number of days its currently been offline (154 in the above case) and then resuming replication. I wasn’t too comfortable with that. What if any older changes from this server now replicate to the other servers? That could cause more damage than it’s worth. I don’t think this will happen, but why take a risk. What I really want is to force a replication onto this server from some other server. Do a non-authoritative replication basically. So I followed the steps in this article and that worked.

A non-authoritative sync is like a regular sync, just that it is rigged to let the source win. :p So all the existing files on the destination server are preserved. The event log gets filled with entries like these:

The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.

Additional Information:
Original File Path: D:\SYSVOL_DFSR\domain\Policies\{F4A04331-3C62-474A-A1CE-517F17914111}\GPT.INI
New Name in Conflict Folder: GPT-{4B7F4510-3C34-4A8A-A397-BC736AE5D9B6}-v55459.INI
Replicated Folder Root: D:\SYSVOL_DFSR\domain
File ID: {8189DA2F-DCBE-4755-A042-72154B648111}-v949
Replicated Folder Name: SYSVOL Share
Replicated Folder ID: 05ECDBE0-A0E6-4A9A-B5BE-7C404E323600
Replication Group Name: Domain System Volume
Replication Group ID: 46C8DB83-463F-459F-8785-DCB19231C52B
Member ID: 7A0E2DA6-4841-40A1-B02D-F5F341345B98
Partner Member ID: 7851F335-6824-4B0D-9978-5A5520ECD547

If you want to see where these files are now moved to check out this blog post. That post has a lot more useful info.

GPO audit policies not applying

I didn’t realize my last post was the 500th one. Yay to me! :)

Had an issue at work today wherein someone had modified a server GPO to enable auditing but nothing was happening.

The GPO had the following.

And it looked like it was applying (output from gpresult /scope computer /h blah.html).

But checking the local policies showed that it wasn’t being applied.

Similarly the output of auditpol /get /category:* showed that nothing was happening.

This is because starting with Server 2008/ Vista Microsoft split the above audit categories to sub-categories, and starting with Server 2008 R2/ 7 allowed one to set these via GPO

My understanding from the above links is that both these sort of policies can mix (especially if the newer ones are not defined), so not entirely sure why the older audit policies were being ignored in my case. There’s even a GPO setting that explicitly let’s one choose either set over the other, but that didn’t have any effect in my case. (The policy is “Audit: Force audit policy subcategory settings (Windows Vista or later) to override audit policy category settings” and setting it to DISABLED gives the original policy categories precedence; by default this is ENABLED).

The newer audit policy categories & sub-categories can be found under the “Advanced Audit Policy Configuration” section in a GPO. In my case I defined the required audit policies here and they took effect.

Something else before I conclude (learnt from this official blog post).

By default GPOs applied to a computer can be found at %systemroot%\System32\GroupPolicy. Local audit policies are stored/ defined at %systemroot%\system32\GroupPolicy\machine\microsoft\windows nt\audit\audit.csv and then copied over to %systemroot%\security\audit\audit.csv. However, audit policies from domain GPOs are not stored there. This point is important to remember coz occasionally you might found forum posts that suggest checking the permissions of these files. They don’t matter for audit policies from domain GPOs.

In general it is better to use auditpol.exe /get /category:* to find the audit policy settings rather than an group policy tools.

Certificates, Subject Alternative Names, etc.

I had encountered this in my testlab but never bothered much coz it was just my testlab after all. But now I am dabbling with certificates at work and hit upon the same issue. 

The issue is that if I create a certificate for mymachine.fqdn but I visit the machine at just mymachine, then I get an error. So how can I tell the certificate that the shorter name (and any other aliases I may have) are also valid? Turns out you need to use the Subject Alternative Name (SAN) field for that!

You can’t add a SAN field to an existing certificate. Got to create a new one. In my case I had simply requested a domain certificate from my IIS server and that doesn’t give any option to specify the SAN.

Instructions for creating a new certificate with SAN field are here and here. The latter has screenshots, so check that out first. In my case, at the step where I select “Web Server” I wasn’t getting “Web Server” as an option. I was only getting “Computer”. Looking into this, I realized it’s coz of the permissions difference. The “Web Server” template only has Domain Admins and Enterprise Admins in its ACLs, while the “Computer” template had Domain Computers too with “Enrol” rights. The fix is simple – go the Manage Templates and change the ACL of “Web Server” accordingly. (You could also use ADSI Edit and edit the ACL in the Configuration section). 

[Aside] Useful CA/ Certificates info

[Aside] How to Secure an ARM-based Windows Virtual Machine RDP access in Azure

Just putting this here as a bookmark to myself for later. A good post. 

Scanning for MS17-010

Was reading about the WannaCrypt attacks. If you have the MS17-010 bulletin patches installed in your estate, you are safe. I wanted to quickly scan our estate to see if the servers are patches with this. Not my job really, but I wanted to do it anyways. 

The security bulletin page lists the actual patch numbers for each version of Windows. We only have Server 2008 – 2016 so that’s all I was interested in. 

Here’s a list of the Server name, internal version, and the patch they should have.

  • Server 2008 | 6.0.6002 | KB4012598
  • Server 2008 R2 | 6.1.7600 | KB4012215 or KB4012212
  • Server 2012 | 6.2.9200 | KB4012214 or KB4012217
  • Server 2012 R2 | 6.3.9600 | KB4012213 or KB4012216
  • Server 2016 | 10.0.14393 | KB4013429

One thing to bear in mind is that it’s possible a server doesn’t have the exact patch installed, but is still not at any risk. That is because since October 2016 Windows patches are cumulative. So if you don’t have the particular March 2017 patch installed, but do have the April 2017 one, you are good to go. The numbers above are from March 2017 – so you will have to update them with patch numbers of subsequent months too to be thorough. 

Another thing – I had one server in my entire estate where the patch above was actually installed but turned up as a false positive in my script. Not sure why. I know it isn’t a script issue. For some reason that patch wasn’t being returned as part of the “Win32_QuickFixEngineering” output. Am assuming it wasn’t installed that way on this particular server.

Without further ado, here’s the script I wrote:

That’s all. Nothing fancy. 

Quickly move DHCP scopes between servers

Wish I had known this earlier. If you want to migrate DHCP servers and have got the two servers setup, run the following on the source server:

And the following on the destination server (the first command is to just connect to the C: drive of the source):

Amazing!

This only does IPv4 scopes, so replace v4 with v6 for IPv6. And this exports all scopes, so replace “all” with the scopes you want if you want a selected few. 

Get-WindowsUpdateLog SymSrv.dll error

Starting with Windows 10 and Server 2016 Microsoft isn’t providing a WindowsUpdate.log file any more. Instead we have to run a cmdlet to generate it manually.

On my Server 2016 the cmdlet gave the following error:

I checked the path “C:\Program Files\Windows Defender” and there’s no “SymSrv.dll” file present there. That’s because I had removed the Windows Defender feature from my Server 2016 install (we use Sophos, so no need really for Windows Defender). Who thought removing Windows Defender would break this cmdlet?

If I check the module “C:\Windows\system32\WindowsPowerShell\v1.0\Modules\WindowsUpdate\WindowsUpdateLog.psm1” sure enough it looks for this DLL and tries to copy it:

Bugger!

Well, workaround is simple. Copy this DLL from any other 2016 machine (or search through the “C:\Windows\WinSxS\” folder on the same machine and copy it from there) to “C:\Program Files\Windows Defender” and the cmdlet will work. 

Or … enable the Windows Defender feature and disable it from Settings. 

Bug in Server 2016 and DNS zone delegation with CNAME records

A colleague at work mentioned a Server 2016 DNS zone delegation bug he had found. I found just one post on the Internet when I searched for this. According to my colleague Microsoft has now confirmed this as a bug in the support call he raised. 

DNS being an area of interest I wanted to replicate the issue and post it – so here goes. Hopefully it makes sense. :)

Imagine a split-DNS scenario for a zone rockylabs.zero

  • This zone is hosted externally by a DNS server (doesn’t matter what OS/ software) called data01
  • This zone is hosted internally by two DNS servers: a Server 2012R2 (called DC2012-01), and a Server 2016 (called DC2016-01). 

Now say there’s a record rakhesh.rockylabs.zero that is the same both internally and externally. As in, we want both internal and external users to get the same (external) IP address for this record. 

What you would typically do is add this record to your external DNS server and create a delegation from your two internal DNS servers, for this record, to the external DNS server. Here’s some screenshots:

The zone on my external DNS server. Notice I have an A record for rakhesh.rockylabs.zero.  

Ignore the rakhesh2.rockylabs.zero record for now. That comes in later. :)

Here’s a look at the delegation from my Server 2012R2 internal DNS server to the external DNS server for the rakhesh.rockylabs.zero record. Basically I create a delegation within the rockylabs.zero zone on the internal server, for the rakhesh domain, and point it to the external DNS server. On the external DNS server rakhesh.rockylabs.zero is defined as an A record so that will be returned as an answer when this delegation chain is followed. 

In my case both the internal DNS servers are also DCs, and the rockylabs.zero zone is AD integrated, so a similar delegation is automatically created on the other DNS server too. 

As would be expected, I am able to resolve this A record correctly from both internal DNS servers.

Now for the fun part!

Notice the rakhesh2.rockylabs.zero record on my external DNS server? Unlike rakhesh.rockylabs.zero this one is a CNAME record. This too should be common for both internal and external users. Shouldn’t be a problem really as it should work similarly to the A record. Following the chain of delegation when I resolve rakhesh2.rockylabs.zero to a CNAME record called rakhesh.com, my DNS server should automatically resolve the A record for rakhesh.com and give me its address as the answer for rakhesh2.rockylabs.zero.  It works with the Server 2012R2 internal DNS server as expected – 

But breaks for the 2016 internal DNS server!

And that’s it! That’s the bug basically. 

Here’s the odd bit though. If I were to query rakhesh.com (the domain to which the CNAME record points to), and then try to resolve the delegated record, it works!

If I go ahead and clear the cache on that 2016 internal server and try the name resolution again, it’s broken as before.

So the issue is that the 2016 DNS Server is able to follow the delegation for rakhesh2.rockylabs.zero to the external DNS server and resolve it to rakhesh.com, but it is doesn’t then go ahead and lookup rakhesh.com to get its A record. But if the A record for rakhesh.com is already cached with it, it is sensible enough to return that address. 

I dug a bit more into this by enabling debug logging on the 2016 server. Here’s what I found.

The 2016 server receives my query:

It passes this on to the external server (10.10.1.11 is data01 – external DNS server where rakhesh2.rockylabs.zero is delegated to). FYI, I am truncating the output here:

It gets a reply with the CNAME record. So far so good. 

Now it queries the external DNS server (data01 – 10.10.1.11) asking for the A record of rakhesh.com! That’s two wrong things: 1) Why ask the external DNS server (who as far as the internal DNS server knows is only delegated the rakhesh2.rockylabs.zero zone and has nothing to do with rakhesh.com) and 2) why ask it for the A record instead of the NS record so it can find the name servers for rakhesh.com and ask those for the IP address of rakhesh.com

It’s pretty much downhill from there, coz as expected the external DNS server replies saying “doh I don’t know” and gives it a list of root servers to contact. FYI, I truncated the output:

The 2016 internal DNS server now replies to the client with a fail message. As expected. 

So now we know why the 2016 server is failing. 

Until this is fixed one workaround would be to create a CNAME record directly in the internal DNS server to whatever the external DNS server points to. That is, don’t delegate to external – just create the same record internally too. Only for CNAME records; A is fine. Here’s an example of it working with a record called rakhesh3.rockylabs.zero where I simply made a CNAME on the internal 2016 DNS server. 

That’s all for now!