Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Recent Posts

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

ADFS errors and WID

Spent a bit of time today tracking down an ADFS/ WID issue. Turned out to be a silly one in the end (silly on my part actually, should have spotted the cause right away!) but it was a good learning exercise in the end. 

The issue was that ADFS refused to launch after a server reboot. The console gave an error that it couldn’t connect to the configuration database. The ADFS service refused to start and the event logs were filled with errors such as these:

The Federation Service configuration could not be loaded correctly from the AD FS configuration database.

Additional Data
Error:
ADMIN0012: OperationFault

There was an error in enabling endpoints of Federation Service. Fix configuration errors using PowerShell cmdlets and restart the Federation Service.

Additional Data
Exception details:
System.ServiceModel.FaultException`1[Microsoft.IdentityServer.Protocols.PolicyStore.OperationFault]: ADMIN0012: OperationFault (Fault Detail is equal to Microsoft.IdentityServer.Protocols.PolicyStore.OperationFault).

A SQL operation in the AD FS configuration database with connection string Data Source=np:\\.\pipe\microsoft##wid\tsql\query;Initial Catalog=AdfsConfiguration;Integrated Security=True failed.

Additional Data

Exception details:
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server)

The last one repeated many times. 

I hadn’t installed the ADFS server in our firm so I had no clue how it was setup. Importantly, I didn’t know it used the Windows Internal Database (WID) which you can see in the error messages above. It is possible to have ADFS work with SQL for a larger setup but that wasn’t the case here. Following some blog posts on the Internet (this and this) I downloaded SQL Server Management Studio (SSMS) and tried connecting to the WID at the path given in the error (\\.\pipe\microsoft##wid\tsql\query). That didn’t work for me – it just gave me some errors that the SQL server was unreachable. 

BTW, according to one of the blog posts it is better to launch SSMS as the user who has rights to connect to the WID database (the service account under which your ADFS service runs for instance). That didn’t help in my case (not saying the advice is incorrect, my issue was something else). Found a Microsoft blog post too that confirmed I was connecting to the correct server name – \\.\pipe\microsoft##wid\tsql\query for Windows 2012 and above; \\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query for Windows 2003 & 2008 – but no go. 

That’s when I realized the WID has its own service. I had missed this initially. Trying to start that gave an error that it couldn’t start due to a login failure. This service runs under an account NT SERVICE\MSSQL$MICROSOFT##WID and looks like it didn’t have logon as service rights. It looks like someone had played around with our GPOs (or moved this server to a different OU) and this account had lost its rights. 

The fix is simple – just give this account rights via GPO (or exclude the server from whatever GPO is fiddling with logon as a service rights; or move this server to some other OU). Since the NT SERVICE\MSSQL$MICROSOFT##WID is not a regular account, you can’t add it to GPO from any server (because the account is local and will only exist if WID is installed). So I opened GPMC on my ADFS server and modified the GPO to give this account logon as a service rights. 

[Aside] SPNs

Trying to get people at work to clean up duplicate SPNs, and came across some links while reading about this topic. 

From the official MSDN article: A service principal name (SPN) is a unique identifier of a service instance. SPNs are used by Kerberos authentication to associate a service instance with a service logon account. This allows a client application to request that the service authenticate an account even if the client does not have the account name.

Basically when a client application tries to authenticate with a service instance and the domain controller needs to issues it Kerberos tickets, the domain controller needs to know whose password to use for the service instance – is it that of the server where this instance runs, or any service account responsible for it. This mapping of service -> service account/ computer account is an SPN. It’s of the format service/host:port and is associated with the AD account of the service account or computer account (stored in the servicePrincipalName attribute actually).

That’s all!

[Aside] DFRS links

Just putting these here as bookmarks to myself.

One of our DCs at work had the following DFSR warnings in the DFS Replication logs:

The DFS Replication service stopped replication on volume C:. This occurs when a DFSR JET database is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication.

Additional Information:
Volume: C:
GUID: 56234A2C-C156-11E2-93E8-806E6F6E6111

Recovery Steps
1. Back up the files in all replicated folders on the volume. Failure to do so may result in data loss due to unexpected conflict resolution during the recovery of the replicated folders.
2. To resume the replication for this volume, use the WMI method ResumeReplication of the DfsrVolumeConfig class. For example, from an elevated command prompt, type the following command:
wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid=”56234A2C-C156-11E2-93E8-806E6F6E6111″ call ResumeReplication

For more information, see http://support.microsoft.com/kb/2663685.

Sounded like an easy fix, so I went ahead and tried resuming replication as directed. That didn’t work though. Got the following:

The DFS Replication service stopped replication on the folder with the following local path: D:\SYSVOL_DFSR\domain. This server has been disconnected from other partners for 154 days, which is longer than the time allowed by the MaxOfflineTimeInDays parameter (60). DFS Replication considers the data in this folder to be stale, and this server will not replicate the folder until this error is corrected.

To resume replication of this folder, use the DFS Management snap-in to remove this server from the replication group, and then add it back to the group. This causes the server to perform an initial synchronization task, which replaces the stale data with fresh data from other members of the replication group.

Additional Information:
Error: 9061 (The replicated folder has been offline for too long.)
Replicated Folder Name: SYSVOL Share
Replicated Folder ID: xxxxx
Replication Group Name: Domain System Volume
Replication Group ID: xxxxxx
Member ID: xxxxx

Yeah, bummer!

Check out this Microsoft blog post for content freshness and the MaxOfflineTimeInDays parameter. You can’t simple remove SYSVOL from DFSR replication groups via the GUI as it is a special folder, so you have to work around. I found some forum posts and blog posts that suggested simply raising this parameter for the broken server to a number larger than the number of days its currently been offline (154 in the above case) and then resuming replication. I wasn’t too comfortable with that. What if any older changes from this server now replicate to the other servers? That could cause more damage than it’s worth. I don’t think this will happen, but why take a risk. What I really want is to force a replication onto this server from some other server. Do a non-authoritative replication basically. So I followed the steps in this article and that worked.

A non-authoritative sync is like a regular sync, just that it is rigged to let the source win. :p So all the existing files on the destination server are preserved. The event log gets filled with entries like these:

The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.

Additional Information:
Original File Path: D:\SYSVOL_DFSR\domain\Policies\{F4A04331-3C62-474A-A1CE-517F17914111}\GPT.INI
New Name in Conflict Folder: GPT-{4B7F4510-3C34-4A8A-A397-BC736AE5D9B6}-v55459.INI
Replicated Folder Root: D:\SYSVOL_DFSR\domain
File ID: {8189DA2F-DCBE-4755-A042-72154B648111}-v949
Replicated Folder Name: SYSVOL Share
Replicated Folder ID: 05ECDBE0-A0E6-4A9A-B5BE-7C404E323600
Replication Group Name: Domain System Volume
Replication Group ID: 46C8DB83-463F-459F-8785-DCB19231C52B
Member ID: 7A0E2DA6-4841-40A1-B02D-F5F341345B98
Partner Member ID: 7851F335-6824-4B0D-9978-5A5520ECD547

If you want to see where these files are now moved to check out this blog post. That post has a lot more useful info.

GPO audit policies not applying

I didn’t realize my last post was the 500th one. Yay to me! :)

Had an issue at work today wherein someone had modified a server GPO to enable auditing but nothing was happening.

The GPO had the following.

And it looked like it was applying (output from gpresult /scope computer /h blah.html).

But checking the local policies showed that it wasn’t being applied.

Similarly the output of auditpol /get /category:* showed that nothing was happening.

This is because starting with Server 2008/ Vista Microsoft split the above audit categories to sub-categories, and starting with Server 2008 R2/ 7 allowed one to set these via GPO

My understanding from the above links is that both these sort of policies can mix (especially if the newer ones are not defined), so not entirely sure why the older audit policies were being ignored in my case. There’s even a GPO setting that explicitly let’s one choose either set over the other, but that didn’t have any effect in my case. (The policy is “Audit: Force audit policy subcategory settings (Windows Vista or later) to override audit policy category settings” and setting it to DISABLED gives the original policy categories precedence; by default this is ENABLED).

The newer audit policy categories & sub-categories can be found under the “Advanced Audit Policy Configuration” section in a GPO. In my case I defined the required audit policies here and they took effect.

Something else before I conclude (learnt from this official blog post).

By default GPOs applied to a computer can be found at %systemroot%\System32\GroupPolicy. Local audit policies are stored/ defined at %systemroot%\system32\GroupPolicy\machine\microsoft\windows nt\audit\audit.csv and then copied over to %systemroot%\security\audit\audit.csv. However, audit policies from domain GPOs are not stored there. This point is important to remember coz occasionally you might found forum posts that suggest checking the permissions of these files. They don’t matter for audit policies from domain GPOs.

In general it is better to use auditpol.exe /get /category:* to find the audit policy settings rather than an group policy tools.

Certificates, Subject Alternative Names, etc.

I had encountered this in my testlab but never bothered much coz it was just my testlab after all. But now I am dabbling with certificates at work and hit upon the same issue. 

The issue is that if I create a certificate for mymachine.fqdn but I visit the machine at just mymachine, then I get an error. So how can I tell the certificate that the shorter name (and any other aliases I may have) are also valid? Turns out you need to use the Subject Alternative Name (SAN) field for that!

You can’t add a SAN field to an existing certificate. Got to create a new one. In my case I had simply requested a domain certificate from my IIS server and that doesn’t give any option to specify the SAN.

Instructions for creating a new certificate with SAN field are here and here. The latter has screenshots, so check that out first. In my case, at the step where I select “Web Server” I wasn’t getting “Web Server” as an option. I was only getting “Computer”. Looking into this, I realized it’s coz of the permissions difference. The “Web Server” template only has Domain Admins and Enterprise Admins in its ACLs, while the “Computer” template had Domain Computers too with “Enrol” rights. The fix is simple – go the Manage Templates and change the ACL of “Web Server” accordingly. (You could also use ADSI Edit and edit the ACL in the Configuration section). 

[Aside] Useful CA/ Certificates info

[Aside] How to Secure an ARM-based Windows Virtual Machine RDP access in Azure

Just putting this here as a bookmark to myself for later. A good post. 

Scanning for MS17-010

Was reading about the WannaCrypt attacks. If you have the MS17-010 bulletin patches installed in your estate, you are safe. I wanted to quickly scan our estate to see if the servers are patches with this. Not my job really, but I wanted to do it anyways. 

The security bulletin page lists the actual patch numbers for each version of Windows. We only have Server 2008 – 2016 so that’s all I was interested in. 

Here’s a list of the Server name, internal version, and the patch they should have.

  • Server 2008 | 6.0.6002 | KB4012598
  • Server 2008 R2 | 6.1.7600 | KB4012215 or KB4012212
  • Server 2012 | 6.2.9200 | KB4012214 or KB4012217
  • Server 2012 R2 | 6.3.9600 | KB4012213 or KB4012216
  • Server 2016 | 10.0.14393 | KB4013429

One thing to bear in mind is that it’s possible a server doesn’t have the exact patch installed, but is still not at any risk. That is because since October 2016 Windows patches are cumulative. So if you don’t have the particular March 2017 patch installed, but do have the April 2017 one, you are good to go. The numbers above are from March 2017 – so you will have to update them with patch numbers of subsequent months too to be thorough. 

Another thing – I had one server in my entire estate where the patch above was actually installed but turned up as a false positive in my script. Not sure why. I know it isn’t a script issue. For some reason that patch wasn’t being returned as part of the “Win32_QuickFixEngineering” output. Am assuming it wasn’t installed that way on this particular server.

Without further ado, here’s the script I wrote:

That’s all. Nothing fancy. 

Quickly move DHCP scopes between servers

Wish I had known this earlier. If you want to migrate DHCP servers and have got the two servers setup, run the following on the source server:

And the following on the destination server (the first command is to just connect to the C: drive of the source):

Amazing!

This only does IPv4 scopes, so replace v4 with v6 for IPv6. And this exports all scopes, so replace “all” with the scopes you want if you want a selected few. 

Get-WindowsUpdateLog SymSrv.dll error

Starting with Windows 10 and Server 2016 Microsoft isn’t providing a WindowsUpdate.log file any more. Instead we have to run a cmdlet to generate it manually.

On my Server 2016 the cmdlet gave the following error:

I checked the path “C:\Program Files\Windows Defender” and there’s no “SymSrv.dll” file present there. That’s because I had removed the Windows Defender feature from my Server 2016 install (we use Sophos, so no need really for Windows Defender). Who thought removing Windows Defender would break this cmdlet?

If I check the module “C:\Windows\system32\WindowsPowerShell\v1.0\Modules\WindowsUpdate\WindowsUpdateLog.psm1” sure enough it looks for this DLL and tries to copy it:

Bugger!

Well, workaround is simple. Copy this DLL from any other 2016 machine (or search through the “C:\Windows\WinSxS\” folder on the same machine and copy it from there) to “C:\Program Files\Windows Defender” and the cmdlet will work. 

Or … enable the Windows Defender feature and disable it from Settings. 

Bug in Server 2016 and DNS zone delegation with CNAME records

A colleague at work mentioned a Server 2016 DNS zone delegation bug he had found. I found just one post on the Internet when I searched for this. According to my colleague Microsoft has now confirmed this as a bug in the support call he raised. 

DNS being an area of interest I wanted to replicate the issue and post it – so here goes. Hopefully it makes sense. :)

Imagine a split-DNS scenario for a zone rockylabs.zero

  • This zone is hosted externally by a DNS server (doesn’t matter what OS/ software) called data01
  • This zone is hosted internally by two DNS servers: a Server 2012R2 (called DC2012-01), and a Server 2016 (called DC2016-01). 

Now say there’s a record rakhesh.rockylabs.zero that is the same both internally and externally. As in, we want both internal and external users to get the same (external) IP address for this record. 

What you would typically do is add this record to your external DNS server and create a delegation from your two internal DNS servers, for this record, to the external DNS server. Here’s some screenshots:

The zone on my external DNS server. Notice I have an A record for rakhesh.rockylabs.zero.  

Ignore the rakhesh2.rockylabs.zero record for now. That comes in later. :)

Here’s a look at the delegation from my Server 2012R2 internal DNS server to the external DNS server for the rakhesh.rockylabs.zero record. Basically I create a delegation within the rockylabs.zero zone on the internal server, for the rakhesh domain, and point it to the external DNS server. On the external DNS server rakhesh.rockylabs.zero is defined as an A record so that will be returned as an answer when this delegation chain is followed. 

In my case both the internal DNS servers are also DCs, and the rockylabs.zero zone is AD integrated, so a similar delegation is automatically created on the other DNS server too. 

As would be expected, I am able to resolve this A record correctly from both internal DNS servers.

Now for the fun part!

Notice the rakhesh2.rockylabs.zero record on my external DNS server? Unlike rakhesh.rockylabs.zero this one is a CNAME record. This too should be common for both internal and external users. Shouldn’t be a problem really as it should work similarly to the A record. Following the chain of delegation when I resolve rakhesh2.rockylabs.zero to a CNAME record called rakhesh.com, my DNS server should automatically resolve the A record for rakhesh.com and give me its address as the answer for rakhesh2.rockylabs.zero.  It works with the Server 2012R2 internal DNS server as expected – 

But breaks for the 2016 internal DNS server!

And that’s it! That’s the bug basically. 

Here’s the odd bit though. If I were to query rakhesh.com (the domain to which the CNAME record points to), and then try to resolve the delegated record, it works!

If I go ahead and clear the cache on that 2016 internal server and try the name resolution again, it’s broken as before.

So the issue is that the 2016 DNS Server is able to follow the delegation for rakhesh2.rockylabs.zero to the external DNS server and resolve it to rakhesh.com, but it is doesn’t then go ahead and lookup rakhesh.com to get its A record. But if the A record for rakhesh.com is already cached with it, it is sensible enough to return that address. 

I dug a bit more into this by enabling debug logging on the 2016 server. Here’s what I found.

The 2016 server receives my query:

It passes this on to the external server (10.10.1.11 is data01 – external DNS server where rakhesh2.rockylabs.zero is delegated to). FYI, I am truncating the output here:

It gets a reply with the CNAME record. So far so good. 

Now it queries the external DNS server (data01 – 10.10.1.11) asking for the A record of rakhesh.com! That’s two wrong things: 1) Why ask the external DNS server (who as far as the internal DNS server knows is only delegated the rakhesh2.rockylabs.zero zone and has nothing to do with rakhesh.com) and 2) why ask it for the A record instead of the NS record so it can find the name servers for rakhesh.com and ask those for the IP address of rakhesh.com

It’s pretty much downhill from there, coz as expected the external DNS server replies saying “doh I don’t know” and gives it a list of root servers to contact. FYI, I truncated the output:

The 2016 internal DNS server now replies to the client with a fail message. As expected. 

So now we know why the 2016 server is failing. 

Until this is fixed one workaround would be to create a CNAME record directly in the internal DNS server to whatever the external DNS server points to. That is, don’t delegate to external – just create the same record internally too. Only for CNAME records; A is fine. Here’s an example of it working with a record called rakhesh3.rockylabs.zero where I simply made a CNAME on the internal 2016 DNS server. 

That’s all for now!

[Aside] Windows Update tools

Wanted to link to these as I came across them while searching for something Windows Updates related.

  • ABC-Update – didn’t try it out but looks useful from a client side point of view. Free.
  • WuInstall – seems to be a client and server thing. Putting it here so I find it if ever needed in future. Paid.
  • Windows Update PowerShell Module – you had me at PowerShell! :0)
    • A blog post explaining this module. Just in case.

Add-DnsServerZoneDelegation with multiple nameservers

Only reason I am creating this post is coz I Googled for the above and didn’t find any relevant hits

I know I can use the Add-DnsServerZoneDelegation cmdlet to create a new delegated zone (basically a sub-domain of a zone hosted by our DNS server, wherein some other DNS server hosts that sub-domain and we merely delegate any requests for that sub-domain to this other DNS server). But I wasn’t sure how I’d add multiple name servers. The switches give an option to give an array of IP addresses, but that’s just any array of IP addresses for a single name server. What I wanted was to have an array of name servers each with their own IP.

Anyways, turns out all I had to do was run the command for each name server. 

Above I create a delegation from my zone “parentzone.com” to 3 DNS servers “DNS0[1-3].somedomain” (also specified by their respective IPs) for the sub-domain “subzone.parentzone.com”.

OU delegation not working (contd.) – finding protected groups

Turns out I was mistaken in my previous post. A few minutes after enabling inheritance, I noticed it was disabled again. So that means the groups must be protected by AD.

I knew of the AdminSDHolder object and how it provides a template set of permissions that are applied to protected accounts (i.e. members of groups that are protected). I also knew that there were some groups that are protected by default. What I didn’t know, however, what that the defaults can be changed. 

Initially I did a Compare-Object -ReferenceObject (Get-ADPrincipalGroupMembership User1) -DifferenceObject (Get-ADPrincipalGroupMembership User2) -IncludeEqual to compare the memberships of two random accounts that seemed to be protected. These were accounts with totally different roles & group memberships so the idea was to see if they had any common groups (none!) and failing that to see if the groups they were in had any common ancestors (none again!)

Then I Googled a bit :o) and came across a solution. 

Before moving on to that though, as a note to myself: 

  • The AdminSDHolder object is at CN=AdminSDHolder,CN=System,DC=domain,DC=com. Find that via ADSI Edit (replace the domain part accordingly). 
  • Right click the object and its Security tab lists the template permissions that will be applied to members of protected groups. You can make changes here. 
  • SDProp is a process that runs every 60 minutes on the DC holding the PDC Emulator role. The period can be changed via the registry key HKLM\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\AdminSDProtectFrequency. (If it doesn’t exist, add it. DWORD). 
  • SDProp can be run manually if required. 

So back to my issue. Turns out if a group as its adminCount attribute set to 1 then it will be protected. So I ran the following against the OU containing my admin  account groups:

Bingo! Most of my admin groups were protected, so most admin accounts were protected. All I have to do now is either un-protect these groups (my preferred solution), or change the template to delegate permissions there. 

Update: Simply un-protecting a group does not un-protect all its members (this is by design). The member objects too have their adminCount attribute set to 1, so apart from fun-protecting the groups we must un-protect the members too. 

Update 2: Found this good post with lots more details. How to run the process manually, what are the default protected groups, etc. Read that post in conjunction with this one and you are set!

Update 3: You can unprotect the following default groups via dsHeuristics: 1) Account Operators, 2) Backup Operators, 3) Server Operators, 4) Print Operators. But that still leaves groups such as Administrators (built-in), Domain Admins, Enterprise Admins, Domain Controllers, Schema Admins, Read-Only Domain Controllers, and the user Administrator (built-in). There’s no way to un-protect members of these.

Something I hadn’t realized about adminCount. This attribute does not mean a group/ user will be protected. Instead, what it means is that if a group/ user is protected, and its ACLs have changed and are now reset to default, then the adminCount attribute will be set. So yes, adminCount will let you find groups/ users that are protected; but merely setting adminCount on a group/ user does not protect it. I learnt this the hard way while I was testing my changes. Set adminCount to 1 for a group and saw that nothing was happening.

Also, it is possible that a protected user/ group does not have adminCount set. This is because adminCount is only set if there is a difference in the ACLs between the user/ group and the AdminSDHolder object. If there’s no difference, a protected object will not have the adminCount attribute set. :)

OU delegation not working

Today I cracked a problem which had troubled us for a while but which I never really sat down and actually tried to troubleshoot. We had an OU with 3rd level admin accounts that no one else had rights to but wanted to delegate certain password related tasks to our Service Desk admins. Basically let them reset password, unlock the account, and enable/ disable. 

Here’s some screenshots for the delegation wizard. Password reset is a common task and can be seen in the screenshot itself. Enable/ Disable can be delegated by giving rights to the userAccountControl attribute. Only force password change rights (i.e. no reset password) can be given via the pwdLastSet attribute. And unlock can be given via the lockoutTime attribute

Problem was that in my case in spite of doing all this the delegated accounts had no rights!

Snooping around a bit I realized that all the admin accounts within the OU had inheritance disabled and so weren’t getting the delegated permissions from the OU (not sure why; and no these weren’t protected group members). 

Of course, enabling is easy. But I wanted to see if I could get a list of all the accounts in there with their inheritance status. Time for PowerShell. :)

The Get-ACL cmdlet can list access control lists. It can work with AD objects via the AD: drive. Needs a distinguished name, that’s all. So all you have to do is (Get-ADUser <accountname>).DistinguishedName) – prefix an AD: to this, and pass it to Get-ACL. Something like this:

The default result is useless. If you pipe and expand the Access property you will get a list of ACLs. 

The result is a series of entries like these:

The attribute names referred to by the GUIDs can be found in the AD Technical Specs

Of interest to us is the AreAccessRulesProtected property. If this is True then inheritance is disabled; if False inheritance is enabled. So it’s straight forward to make a list of accounts and their inheritance status:

So that’s it. Next step would be to enable inheritance on the accounts. I won’t be doing this now (as it’s bed time!) but one can do it manually or script it via the SetAccessRuleProtection method. This method takes two parameters (enable/ disable inheritance; and if disable then should we add/ remove existing ACEs). Only the first parameter is of significance in my case, but I have to pass the second parameter too anyways – SetAccessRuleProtection($False,$True).

Update: Here’s what I rolled out at work today to make the change.

Update 2: Didn’t realize I had many users in the built-in protected groups (these are protected even though their adminCount is 0 – I hadn’t realized that). To unprotect these one must set the dsHeuristics flag. The built-in protected groups are 1) Account Operators, 2) Server Operators, 3) Print Operators, and 4) Backup Operators. See this post on instructions (actually, see the post below for even better instructions).

Update 3: Found this amazing page that goes into a hell of details on this topic. Be sure to read this before modifying dsHeuristics.