Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Recent Posts

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

DFSR misconception: the hub server does not mean it is the master

Came across this Microsoft blog post by chance. To quote:

If the topology is set up for hub and spoke, and the spoke were to accidentally delete an item, this should not reflect back to the hub, correct? This should be a one way transfer. What we are seeing is our hub replicates out to the spokes perfectly, but if the spoke deletes an item, the item is then deleted from our hub share. It seems to be acting like a full mesh topology, but it was originally set up at as hub and spoke.

The behavior the customer describes is by design. Because DFS Replication is a multimaster replication engine, any change made on any spoke is replicated back to the hub and to the other spokes. To prevent changes from occurring on spokes, we recommend using shared folder permissions.

I too had always thought a hub-spoke design means the hub is the master server. But now I realize how wrong I was. Basically a hub-spoke or full mesh topology only determines the sync path – it does not denote which server is the master and which servers are slaves. DFSR, like AD, has no master or slave.

In a hub-spoke replication topology, two spoke servers will sync with each other via the hub server – that’s all! Neither server is “inferior” to the master in any way.

Notes on DFS referrals

Was brushing up about DFS referrals today as I had a doubt about something at work. Thought I’d put a shout out to this interesting link that I came across.

A DFS namespace (e.g. \\contoso\pub) has links (e.g. \\contoso\pub\documents). These links can point to multiple targets (e.g. \\server1\documents, \\server2\documents, and so on). When a client visits the link the target that’s chosen is the one in the same AD site as the client. If there is no target in the same site as the client then one of three situation can happen (you have to choose what happens per namespace, but can override it per link):

  • A list of targets from all sites is returned at random.
  • A list of targets is returned based on cost.
    • All sites in the domain will have a cost from the site the client is in. This cost is defined in “AD Sites & Services” and is cumulative (i.e. if Site A to Site B has cost 10, and Site B to Site C has cost 10, and there’s no explicit cost defined between Site A to Site C then the cost from Site A to Site C is taken as 10+10 = 20).
    • Targets from sites closest to the client site are listed in random, followed by targets from sites further away from the client site, and so on.
  • No targets are returned (this is also called in-site only).
    • So if there are no targets in the same site as the client, then the path fails.

Ordering

Apart from these three possibilities, there’s also a fail back (which is hidden behind the drop down in the screen shot above).

Failback

So if a server has no targets to offer a client, it will fail back to whatever targets are set as preferred for a link. I’ll show what preferred targets are in a bit. 

The above settings can be defined on the namespace itself or on each DFS link.

Link Ordering

Now on to preferred targets. If you go to the Properties > Advanced tab of each target, you can set its priority. That is to say, if a target is on same preference level as a bunch of other targets (because they are all in the same site or not) then you can set it to have a higher or lower priority.

Preferred Target

By default there are no preferred targets.

The cool thing I learnt from that post is that if the referral order is set to in-site (i.e. exclude targets from outside the client site) and fail back to preferred targets is enabled (the default) and a target outside the site is set as preferred, then it too will be returned in the list of targets along with the ones in site. This way you can limit referrals to be in-site but have a few selected targets out of site as a fail-back.

One thing to keep in mind though is that since you want the out of site target to be set to lower priority than the in-site one, you must specify its priority as “Last among all targets”. Because if it were set as “First among all targets” then it will take precedence over the in-site target too – which is not what we want. Lastly, there’s no point setting the priority to “First among targets of equal cost” (or “Last”) in the case of in-site referrals as it will have no effect (because the cost of the in-site target and the external targets are different so it doesn’t apply).

AppV – Empty package map for package content root

Had an interesting problem at work yesterday about which I wish I could write a long and interesting blog post, but truthfully it was such a simple thing once I identified the cause.

We use AppV for streaming applications. We have many branch offices so there’s a DFS share which points to targets in each office. AppV installations in each office point to this DFS share and thanks to the magic of DFS referrals correctly pick up the local Content folder. From day-before, however, one of our offices started getting errors with AppV apps (same as in this post), and when I checked the AppV server I found errors similar to this in the Event Logs:

The DFS share seemed to be working OK. I could open it via File Explorer and its contents seemed correct. I checked the number of files and the size of the share and they matched across offices. If I pointed the DFS share to use a different target (open the share in File Explorer, right click, Properties, go to the DFS tab and select a different location target) AppV works. So the problem definitely looked like something to do with the local target, but what was wrong?

I tried forcing a replication. And checked permissions and used tools like dfsrdiag to confirm things were alright. No issues anywhere. Restarting the DFS Replication service on the server threw up some errors in the Event Logs about some AD objects, so I spent some time chasing up that tree (looks like older replication groups that were still hanging around in AD with missing info but not present in the DFS Management console any more) until I realized all the replication servers were throwing similar errors. Moreover, adding a test folder to the source DFS share correctly resulted it in appearing on the local target immediately – so obviously replication was working correctly.

I also used robocopy to compare the the local target and another one and saw that they were identical.

Bummer. Looked like a dead end and I left it for a while.

Later, while sitting through a boring conference call I had a brainwave that maybe the AppV service runs in a different user context and that may not be seeing the DFS share? As in, maybe the error message above is literally what is happening. AppV is really seeing an empty content root and it’s not a case of a corrupt content root or just some missing files?

So I checked the AppV service and saw that it runs as NT AUTHORITY\NETWORK SERVICE. Ah ha! That means it authenticates with the remote server with the machine account of the server AppV is running on. I thought I’d verify what happens by launching File Explorer or a Command Prompt as NT AUTHORITY\NETWORK SERVICE but this was a Server 2003 and apparently there’s no straightforward way to do that. (You can use psexec to launch something as .\LOCALSYSTEM and starting from Server 2008 you can create a scheduled task that runs as NT AUTHORITY\NETWORK SERVICE and launch that to get what you want but I couldn’t use that here; also, I think you need to first run as the .\LOCALSYSTEM account and then run as the NT AUTHORITY\NETWORK SERVICE account). So I checked the Audit logs of the server hosting the DFS target and sure enough found errors that the machine account of the AppV server was indeed being denied login:

Awesome! Now we are getting somewhere.

I fired up the Local Security Policy console on the server hosting the DFS target (it’s under the Administrative Tools folder, or just type secpol.msc). Then went down to “Local Policies” > “User Rights Assignment” > “Access this computer from the Network”:

secpolSure enough this was limited to a set of computers which didn’t include the AppV server. When I compared this with our DFS servers I saw that they were still on the default values (which includes “Everyone” as in the screenshot above) and that’s why those targets worked.

To dig further I used gpresult and compared the GPOs that affected the above policy between both servers. The server that was affected had this policy modified via  GPO while the server that wasn’t affected showed the GPO as inaccessible. Both servers were in the same OU but upon examining the GPO I saw that it was limited to a certain group only. Nice! And when I checked that group our problem server was a member of it while the rest weren’t! :)

Turns out the server was added to the group by error two days ago. Removed the server from this group, waited a while for the change across the domain, did a gpupdate on the server, and tada! now the AppV server is able to access the DFS share on this local target again. Yay!

Moral of the story: if one of your services is unable to access a shared folder, check what user account the service runs as.

DFS Namespaces missing on some servers / sites

This is something I had sorted out for one of our offices earlier. Came across it for another office today and thought I should make a blog post of it. I don’t remember if I made a a blog post the last time I fixed it (and although it’s far easier to just search my blog and then decide whether to make a blog post or not, I am lazy that way :)).

Here’s the problem. We use AppV to stream some applications to our users. One of our offices started complaining that AppV was no longer working for them. Here’s the error they got:

cderror

I logged on to the user machine to find out where the streaming server is:

Checked the Event Logs on that server and there were errors from the “Application Virtualization” source along these lines:

So I checked the DFS path and it was empty. In fact, there was no folder called “Content” under “\\domain.com\dfs” – which was odd!

I switched DFS servers to that of another location and all the folders appeared. So the problem definitely was with the DFS server of this site.

From the DFS Management tool I found the namespace server for this site (it was the DC) and the folder target (it was one of the data servers). Checked the folder target share and that was accessible, so no issues there. It was looking to be a DFS Namespace issue.

Hopped on to the DFS and checked “C:\DFSRoots\DFS”. It didn’t have the “Content” folder above – so definitely a DFS Namespace issue!

I ran dfsdiag and it gave no errors:

So I checked the Event Logs on the DC. No errors there. Next I restarted the “DFS Namespace” service and voila! all the namespaces appeared correctly.

Ok, so that fixed the problem but why did it happen in the first place? And what’s there to ensure it doesn’t happen again? The site was restarted yesterday as part of some upgrade work so did that cause the namespaces to fail?

I checked the timestamps of the DFS Namespace entries (these are from source “DfsSvc” in “Custom Views” > “Server Roles” > “File Server”). Once the namespaces were ready there was an entry along the following lines at 08:06:43 (when the DC came back up from the reboot):

No errors there. But where does the DC get its name spaces from? This was a domain DFS so it would be getting it from Active Directory. So let’s look at the AD logs under “Applications and Services Logs” > “Directory Service”. When the domain services are up and running there’s an entry from the “ActiveDirectory_DomainService” source along these lines:

On this server the entry was made at 08:06:47. Notice the timestamp – it occurs after the “DFS Namespace” service has finished building namespace! And therein lies the problem. Because the Directory Services took a while to be ready, when DFS was being initialized it went ahead with stale data. When I restarted the service later, it picked up the correct data and sorted the namespaces.

I rebooted the server a couple more times but couldn’t replicate the problem as the Directory Service always initialized before DFS Namespace. To be on the safe side though I set the startup type of “DFS Namespace” to be “Automatic (Delayed)” so it always starts after the “Directory Service”.

 

Active Directory: Domain Controller critical services

The first of my (hopefully!) many posts on Active Directory, based on the WorkshopPLUS sessions I attended last month. Progress is slow as I don’t have much time, plus I am going through the slides and my notes and adding more information from the Internet and such. 

This one’s on the services that are critical for Domain Controllers to function properly. 

DHCP Client

  • In Server 2003 and before the DHCP Client service registers A, AAAA, and PTR records for the DC with DNS
  • In Server 2008 and above this is done by the DNS Client
  • Note that only the A and PTR records are registered. Other records are by the Netlogon service.

File Replication Services (FRS)

  • Replicates SVSVOL amongst DCs.
  • Starting with Server 2008 it is now in maintenance mode. DFSR replaces it.
    • To check whether your domain is still using FRS for SYSVOL replication, open the DFS Management console and see whether the “Domain System Volume” entry is present under “Replication” (if it is not, see whether it is available for adding to the display). If it is present then your domain is using DFSR for SYSVOL replication.
    • Alternatively, type the following command on your DC. If the output says “Eliminated” as below, your domain is using DFSR for SYSVOL. (Note this only works with domain functional level 2008 and above).
  • Stopping FRS for long periods can result in Group Policy distribution errors as SYSVOL isn’t replicated. Event ID 13568 in FRS log.

Distributed File System Replication (DFSR)

  • Replicates SYSVOL amongst DCs. Replaced functionality previously provided by FRS. 
  • DFSR was introduced with Server 2003 R2.
  • If the domain was born functional level 2008 – meaning all DCs are Server 2008 or higher – DFSR is used for SYSVOL replication.
    •  Once all pre-Server 2008 DCs are removed FRS can be migrated to DFSR. 
    • Apart from the dfsrmig command mentioned in the FRS section, the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\DFSR\Parameters\SysVols\Migrating Sysvols\LocalState registry key can also be checked to see if DFSR is in use (a value of 3 means it is in use). 
  • If a DC is offline/ disconnected from its peers for a long time and Content Freshness Protection is turned on, when the DC is online/ reconnected DFSR might block SYSVOL replications to & from this DC – resuling in Group Policy distribution errors.
    • Content Freshness Protection is off by default. It needs to be manually turned on for each server.
    • Run the following command on each server to turn it on:

      Replace 60 with the maximum number of days it is acceptable for a DC or DFSR member to be offline. The recommended value is 60 days. And to turn off:

      To view the current setting:

    • Content Freshness Protection exists because of the way deletions work.
      • DFSR is multi-master, like AD, which means changes can be made on any server.
      • When you delete an item on one server, it can’t simply be deleted because then the item won’t exist any more and there’s no way for other servers to know if that’s the case because the item was deleted or because it wasn’t replicated to that server in the first place.
      • So what happens is that a deleted item is “tombstoned“. The item is removed from disk but a record for it remains the in DFSR database for 60 days (this period is called the “tombstone lifetime”) indicating this item as being deleted.
      • During these 60 days other DFSR servers can learn that the item is marked as deleted and thus act upon their copy of the item. After 60 days the record is removed from the database too.
      • In such a context, say we have DC that is offline for more than 60 days and say we have other DCs where files were removed from SYSVOL (replicated via DFSR). All the other DCs no longer have a copy of the file nor a record that it is deleted as 60 days has past and the file is removed for good.
      • When the previously offline DC replicates, it still has a copy of the file and it will pass this on to the other DCs. The other DCs don’t remember that this file was deleted (because they don’t have a record of its deletion any more as as 60 days has past) and so will happily replicate this file to their folders – resulting in a deleted file now appearing and causing corruption.
      • It is to avoid such situations that Content Freshness Protection was invented and is recommended to be turned on.
    • Here’s a good blog post from the Directory Services team explaining Content Freshness Protection.

DNS Client

  • For Server 2008 and above registers the A, AAAA, and PTR records for the DC with DNS (notice that when you change the DC IP address you do not have to update DNS manually – it is updated automatically. This is because of the DNS Client service).
  • Note that only the A, AAAA, and PTR records are registered. Other records are by the Netlogon service.  

DNS Server

  •  The glue for Active Directory. DNS is what domain controllers use to locate each other. DNS is what client computers use to find domain controllers. If this service is down both these functions fail.  

Kerberos Distribution Center (KDC)

  • Required for Kerberos 5.0 authentication. AD domains use Kerberos for authentication. If the KDC service is stopped Kerberos authentication fails. 
  • NTLM is not affected by this service. 

Netlogon

  • Maintains the secure channel between DCs and domain members (including other DCs). This secure channel is used for authentication (NTLS and Kerberos) and DC replication.
  • Writes the SRV and other records to DNS. These records are what domain members use to find DCs.
    • The records are also written to a file %systemroot%\system32\config\Netlogon.DNS. If the DNS server doesn’t support dynamic updates then the records in this text file must be manually created on the DNS server. 

Windows Time

  • Acts as an NTP client and server to keep time in sync across the domain. If this service is down and time is not in sync then Kerberos authentication and AD replication will fail (see resolution #5 in this link).
    • Kerberos authentication may not necessarily break for newer versions of Windows. But AD replication is still sensitive to time.  
  • The PDC of the forest root domain is the master time keeper of the forest. All other DCs in the forest will sync time from it.
    • The Windows Time service on every domain member looks to the DC that authenticates them for time time updates.
    • DCs in the domain look to the domain PDC for time updates. 
    • Domain PDCs look to the domain PDC of the domain above/ sibling to them. Except the forest root domain PDC who gets time from an external source (hardware source, Internet, etc).
  • From this link: there are two registry keys HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config\MaxPosPhaseCorrection and HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config\MaxNegPhaseCorrection that restrict the time updates accepted by the Windows Time service to the number of seconds defined by these values (the maximum and minimum range). This can be set directly in the registry or via a GPO. The recommended value is 172800 (i.e. 48 hours).

w32tm

The w32tm command can be used to manage time. For instance:

  • To get an idea of the time situation in the domain (who is the master time keeper, what is the offset of each of the DCs from this time keeper):
  • To ask the Windows Time service to resync as soon as possible (the command can target a remote computer too via the /computer: switch)

    • Same as above but before resyncing redetect any network configuration changes and rediscover the sources:
  • To get the status of the local computer (use the /computer: switch to target a different computer)
  • To show what time sources are being used:
  • To show who the peers are:
  • To show the current time zone:

    • You can’t change the time zone using this command; you have to do:

On the PDC in the forest root domain you would typically run a command like this if you want it to get time from an NTP pool on the Internet:

Here’s what the command does:

  • specify a list of peers to sync time from (in this example the NTP Pool servers on the Internet);
  • the /update switch tells w32tm to update the Windows Time service with this configuration change;
  • the /syncfromflags:MANUAL tells the Windows Time service that it must only sync from these sources (other options such as “DOMHIER” tells it to sync from the domain peers only, “NO” tells it sync from none, “ALL” tells it to sync from both the domain peers and this manual list);
  • the /reliable:YES switch marks this machine as special in that it is a reliable source of time for the domain (read this link on what gets set when you set a machine as RELIABLE).

Note: You must manually configure the time source on the PDC in the forest root domain and mark it as reliable. If that server were to fail and you transfer the role to another DC, be sure to repeat the step. 

On other machines in the domain you would run a command like this:

This tells those DCs to follow the domain hierarchy (and only the domain hierarchy) and that they are not reliable time sources (this switch is not really needed if these other DCs are not PDCs).

Active Directory Domain Services (AD DS)

  • Provides the DC services. If this service is stopped the DC stops acting as a DC. 
  • Pre-Server 2008 this service could not be stopped while the OS was online. But since Server 2008 it can be stopped and started. 

Active Directory Web Services (AD WS)

  • Introduced in Windows Server 2008 R2 to provide a web service interface to Active Directory Domain Services (AD DS), Active Directory Lightweight Domain Services (AD LDS), and Active Directory Database Mounting Tool instances running on the DC.
    • The Active Directory Database Mounting Tool was new to me so here’s a link to what it does. It’s a pretty cool tool. Starting from Server 2008 you can take AD DS and AD LDS snapshots via the Volume Snapshots Service (VSS) (I am writing a post on VSS side by side so expect to see one soon). This makes use of the NTDS VSS writer which ensures that consistent snapshots of the AD databases can be taken. The AD snapshots can be taken manually via the ntdsutil snapshot command or via backup software  or even via images of the whole system. Either ways, once you have such snapshots you can mount the one(s) you want via ntdsutil and point Active Directory Database Mounting Tool to it. As the tool name says it “mounts” the AD database in the snapshot and exposes it as an LDAP server. You can then use tools such as ldp.exe of the AD Users and Computers to go through this instance of the AD database. More info on this tool can be found at this and this link.
  • AD WS is what the PowerShell Active Directory module connects to. 
  • It is also what the new Active Directory Administrative Center (which in turn uses PowerShell) too connects to.
  • AD WS is installed automatically when the AD DS or AD LDS roles are installed. It is only activated once the server is promoted to a DC or if and AD LDS instance is created on it. 

GPO errors due to SYSVOL replication issues

So the other day at my test lab I noticed many of my GPOs were giving errors. Ran gpupdate /force and it gave the following message:

User policy could not be updated successfully. The following errors were encountered:

The processing of Group Policy failed. Windows attempted to read the file \\rakhesh.local\SysVol\rakhesh.local\Policies\{F28486EC-7C9D-40D6-A243-F1F733979D5C}\gpt.ini from a domain controller and was not successful. Group Policy settings may not be applied until this event is resolved. This issue may be transient and could be caused by one or more of the following:
a) Name Resolution/Network Connectivity to the current domain controller.
b) File Replication Service Latency (a file created on another domain controller has not replicated to the current domain controller).
c) The Distributed File System (DFS) client has been disabled.
Computer Policy update has completed successfully.

To diagnose the failure, review the event log or run GPRESULT /H GPReport.html from the command line to access information about Group Policy results.

Looked like a temporary thing so I didn’t give it much thought first. But when policies were still not being read after half an hour or so I figured something is broken.

Running gpresult /h didn’t have much to add. I noticed the there were errors similar to the one above. I also noticed that another policy had a version mismatch to the one on the server (this wasn’t highlighted anywhere, I just happened to notice).

I went to the path in the error message on each of my DCs (replace rakhesh.local with your DC name) to find out which DC had a missing file. As the message indicated, it was a replication issue. I had updated the GPO on one of the DCs but looks like it didn’t replicate to some of the other DCs; and my clients were trying to find the policy files on this DC where it didn’t replicate to, thus throwing errors.

Event Logs on the clients didn’t have much to add except for error messages similar to the above.

Event Logs on the DCs were more useful. “Applications and Services Logs” > “DFS Replication” is the place to look here. Interestingly, the DC where I couldn’t find the GPO above reported no errors. It had an entry like this:

The DFS Replication service successfully established an inbound connection with partner WIN-DC01 for replication group Domain System Volume.

Additional Information:
Connection Address Used: WIN-DC01.rakhesh.local
Connection ID: 11C8A7A0-F8A5-4867-B277-78DDC66E3ED3
Replication Group ID: 7C0BF99B-677B-4EDA-9B47-944D532DF7CB

This gives you the impression that all is fine. The other DC though had many errors. One of them was event ID 4012:

The DFS Replication service stopped replication on the folder with the following local path: C:\Windows\SYSVOL\domain. This server has been disconnected from other partners for 118 days, which is longer than the time allowed by the MaxOfflineTimeInDays parameter (60). DFS Replication considers the data in this folder to be stale, and this server will not replicate the folder until this error is corrected.

To resume replication of this folder, use the DFS Management snap-in to remove this server from the replication group, and then add it back to the group. This causes the server to perform an initial synchronization task, which replaces the stale data with fresh data from other members of the replication group.

Additional Information:
Error: 9061 (The replicated folder has been offline for too long.)
Replicated Folder Name: SYSVOL Share
Replicated Folder ID: 0546D0D8-E779-4384-87CA-3D4ABCF1FA56
Replication Group Name: Domain System Volume
Replication Group ID: 7C0BF99B-677B-4EDA-9B47-944D532DF7CB
Member ID: 93D960C2-DE50-443F-B8A1-20B04C714E16

Good! So that explains it. Being a test lab, all my DCs aren’t usually online. In this case, WIN-DC01 is what I always have online but WIN-DC02 is only powered on when I want to test anything specific. Since the two weren’t in touch for more than 60 days, WIN-DC01 had stopped replicating with WIN-DC02. Makes sense – you wouldn’t want stale data from WIN-DC02 to overwrite fresh data on WIN-DC01 after all.

The steps in the Event Log entry didn’t work though. Being a SYSVOL folder there’s no option to remove a server from the replication group in DFS Management.

I searched online for any suggestions related to event ID 4012 and SYSVOL, adn found a Microsoft KB article. This article suggested running two commands which I’d like to post here as a reminder to myself as they are generally useful to quickly pinpoint the source of an error. Being a test lab I have just 2-3 DCs so I can go through them manually; but if I had many DCs it is useful to know how to find the faulty ones fast.

First command simply checks whether the SYSVOL folder is shared on each DC:

Next command uses WMIC to get the replication state of this folder on each DC.

A state of 5 indicates the group is in error. Interestingly, the “Access denied” message is one I had noticed earlier too when running dcdiag on the faulty DC.

I didn’t think much of it that time as there were no errors when running dcdiag from the working DC and also because I ran this command while I thought the error was a temporary thing. I still don’t think this is a permissions issue. My guess is that it is to do with the content freshness error I found in the Event Logs earlier. Possibly due to that the second DC is being denied access to replicate and that’s why I get the “access denied” error above.

From the KB article above I came across another one that has steps on how to force a of SYSVOL from one server to all the others. (In the linked article, the section ‘How to perform an authoritative synchronization of DFSR-replicated SYSVOL (like “D4” for FRS)’ is what I am talking about). Here’s what I did:

  1. I launched adsiedit.msc and connected with the default settings
  2. Then I opened CN=SYSVOL Subscription,CN=Domain System Volume,CN=DFSR-LocalSettings,CN=WIN-DC01,OU=Domain Controllers,DC=rakhesh,DC=local (replace the DC name and domain name with yours; in my case WIN-DC01 is the DC I want to set as authoritative) and …
  3. … I set msDFSR-Enabled=FALSE and msDFSR-options=1

    adsiedit

  4. Next, I opened CN=SYSVOL Subscription,CN=Domain System Volume,CN=DFSR-LocalSettings,CN=WIN-DC02,OU=Domain Controllers,DC=rakhesh,DC=local – WIN-DC02 being the DC that is out of sync – and set msDFSR-Enabled=FALSE.
  5. For AD replication through out the domain:

  6. The result of the above steps will be that SYSVOL replication is disabled through out the domain. I confirmed that by going to the “DFS Replication” logs and noting an event with ID 4114:

    The replicated folder at local path C:\Windows\SYSVOL\domain has been disabled. The replicated folder will not participate in replication until it is enabled. All data in the replicated folder will be treated as pre-existing data when this replicated folder is enabled.

    Additional Information:
    Replicated Folder Name: SYSVOL Share
    Replicated Folder ID: 0546D0D8-E779-4384-87CA-3D4ABCF1FA56
    Replication Group Name: Domain System Volume
    Replication Group ID: 7C0BF99B-677B-4EDA-9B47-944D532DF7CB
    Member ID: 09E14860-47F6-49EE-820E-43F7ED015FEB

  7. Then I went back to CN=SYSVOL Subscription,CN=Domain System Volume,CN=DFSR-LocalSettings,CN=WIN-DC01,OU=Domain Controllers,DC=rakhesh,DC=local but this time I set msDFSR-Enabled=TRUE.
  8. Again, I forced AD replication through out the domain.
  9. Next, I ran dfsrdiag pollad on WIN-DC01 (the DC with the correct copy) to perform an initial sync with AD. (Note: My DC was a Server 2012 Core install. This didn’t have dfsrdiag by default. I installed it via Install-WindowsFeature RSAT-DFS-Mgmt-Con in PowerShell).
  10. I checked the Event Logs of WIN-DC01 to confirm SYSVOL is now replicating (event ID 4602 should be present):

    The DFS Replication service successfully initialized the SYSVOL replicated folder at local path C:\Windows\SYSVOL\domain. This member is the designated primary member for this replicated folder. No user action is required. To check for the presence of the SYSVOL share, open a command prompt window and then type “net share”.

    Additional Information:
    Replicated Folder Name: SYSVOL Share
    Replicated Folder ID: 0546D0D8-E779-4384-87CA-3D4ABCF1FA56
    Replication Group Name: Domain System Volume
    Replication Group ID: 7C0BF99B-677B-4EDA-9B47-944D532DF7CB
    Member ID: 93D960C2-DE50-443F-B8A1-20B04C714E16
    Read-Only: 0

  11. Finally, I went back to CN=SYSVOL Subscription,CN=Domain System Volume,CN=DFSR-LocalSettings,CN=WIN-DC01,OU=Domain Controllers,DC=rakhesh,DC=local and set msDFSR-Enabled=TRUE. You can see what’s happening, don’t you? First we disabled SYSVOL replication on all the DCs. Then we enabled it on the authoritative DC and got it to sync to AD. And now we are re-enabling it on all the other DCs so they get a fresh copy from the authoritative DC.
  12. Lastly, I did a dfsrdiag pollad on WIN-DC02 … and while it should have worked without any issues, I got the following error:

    C:\Users>dfsrdiag pollad
    [ERROR] Access is denied when connecting to WMI services on computer: WIN-DC02.r
    akhesh.local

    Operation Failed

However, Event Logs on WIN-DC02 showed that SYSVOL was now replicating successfully and clients are now able to download GPOs successfully. So it looks like the “access denied” error is something else altogether (as I suspected initially, yaay!). I’ll have to look into that later – I know in the past I’ve fiddled with the WMI settings on this machine so maybe I made some change that I forgot to reset.

Good stuff!

Clients picking up the wrong DFSN target, dfsutil shows AccessStatus 0xc000003a

We use DFS at work and I noticed one of our laptops was always connecting to a target in a far away site. My own desktop was connecting to the wrong target but after a reboot it connected to the correct one. There weren’t any user complaints about slowness, so could be that other machines were ok and/ or they weren’t noticing the slowness.

DFS seemed to be set up correctly, and the laptop was picking up an IP in the correct site etc so it wasn’t any obvious misconfiguration.

To investigate further I used the dfsutil tool. I got the idea for this from an old KB article. The dfsutil tool is a part of the DFS Management tools and is present in %WINDIR%\System32. It can be used to administer DFS-N and is more powerful than the GUI. Since the laptop didn’t have this tool I copied it from my desktop (which had the DFS Management tools installed) to the laptop.

First I checked if the laptop is able to reach the correct DC or DFSN server. I do this by examining the domain cache on the client.

In the past the command used to be dfsutil /SpcInfo but now it’s the more intuitive dfsutil cache domain. The older switch still works though. (The domain cache is also called SPC cache, hence the switch is SpcInfo).

In the output the entries with a * are those which were obtained through the Workstation service. Like the domain name, the NETBIOS domain name, and the DC which the laptop is connected to.

All other entries (both – and +) are obtained through referrals by the DFSN client. Of these, entries with a + are currently used by the client. So in the case above, the laptop had got a referral to the rakhesh.local domain and was correctly latching on to the WIN-DC01 server which is the DC and DFSN server for its site.

Next I examined the referral cache of the client to see if there are any errors that point out to why it is not using the correct referral link.

DFSN stores information on the root name, the root servers, links, and target servers in a Partition Knowledge Table (PKT). This is stored in AD (in case of a domain based DFS) or the Registry (in case of a standalone DFS). Clients refer to this and cache it with then (so they don’t have to look it up always) and this is what we need to examine. Previously the command for this used to be dfsutil /PktInfo but now it’s replaced by the more intuitive dfsutil cache referral (notice how everything comes under the dfsutil cache sub-command).

And therein lies the problem! The laptop is correctly connecting to the WIN-DC01 DFSN server, this server is correctly returning the namespace (“pub”) and link (“downloads”) and referrals to the two targets for this link (on win-data01 and win-data02). But whereas we want the laptop to use win-data01, that is not set to active and has an error status returned. Instead win-data02 is set as the active target.

There doesn’t seem to be a definite cause behind the “0xc000003a” error code. What helped me finally was this forum post. I didn’t try the solution that helped the OP (though our office environment has a similar setup) as one of the suggested solutions did the trick for me.

It appears that if the link targets have different offline caching settings, it considers the targets with offline caching enabled first and only if that fails does it consider the preferred target. So I checked my targets to see if they have different caching settings and sure enough the preferred target didn’t have caching. Turned on caching for it, rebooted the laptop, and now it picks up the correct target!

An easy way to check whether all your targets have similar ACLs and offline caching settings is to us the dfsdiag command on the DC/ DFSN server. I ran the command thus:

More about the dfsdiag command in this TechNet post.

Hopefully this helps someone.

DFS namespace empty after reboot

One of our sites had the following issue. It has a Server 2008 R2 DC which was also hosts DFS and whenever the server is rebooted, sometimes the DFS namespace appears empty until the DFS Namespace service is restarted.

I tracked it down to a timing issue.

The DFS namespace was a domain based one. Which means even though the server hosts the namespace root, it doesn’t have any information on the root until AD is up and running. This is because domain based DFS information is stored in AD (unlike stand-alone DFS where the information is stored in the server registry) and so the DFS Namespace service looks to AD for details. Had the DFS server been a separate one it would have contacted the local DC and got this, but in this case the DC and DFS Namespace are on the same server and so it all came down to which service was up first. If AD initialized before DFS, when the Namespace service runs it is able to contact AD and get the info. But if the Namespace initializes before AD, when it contacts AD for info it gets nothing and so assumes there’s no namespaces present. The result is an empty namespace, and the c:\DFSRoots folder (or whatever you have chosen) is empty. If you restart the DFS Namespace service a while later, it is able to contact AD and present the namespace.

The easiest way to see if AD is initialized is via Event Viewer. Go to Custom Views > Server Roles > Active Directory Domain Service. When AD is ready an event ID 1000 is generated from source ActiveDirectory_DomainService that tells you AD startup is complete.

AD initialized

Similarly, to check when DFS initialized go to the Event View, Windows Logs > System. Check for events from source DfsSvc – when DFS has completed initializing an event with ID 14531 is generated.

DFS initialized
When the namespace is empty you’ll see that the time-stamp of the DFS event is before the AD event. In my testing as long as DFS initializes a second before AD is ready the namespaces are fine, but anything longer than that and the namespaces are empty.

There doesn’t seem to be any other fix for this than delay the startup of the DFS Namespace service. Again, only the DFS Namespace service is what matters. Not the DFS Replication service. There are two parts to DFS – one if the namespace, which provides the unified namespace and folders beneath that; other is the replication bit which ensures that if a target points to multiple servers each of these servers are kept in sync. You don’t necessarily have to use the DFS Replication service to keep these replicated – use whatever tool you prefer, DFS Replication is what’s provided in the box for you. If your namespace is missing the DFS Replication service has nothing to do with it. (On the other hand, if your namespace was fine but some of the targets had incorrect data then DFS Replication is what you should focus troubleshooting on).

What I did in this case was set the startup type of DFS Namespace to “Automatic (Delayed Start)”. In my testing, on this particular server, this delayed the startup of the DFS Namespace service by about 3 minutes. AD initializes in about 50-90 seconds so there’s ample time after AD is ready and for other services to launch, before DFS Namespace runs. There is no impact for users too as there are multiple namespace servers – all of them being in different sites in this situation – and so if users were to connect to DFS in the 3 minute period after a server reboot they would connect to one of these different namespace servers. Yes, there will be a bit of a delay as they are connecting to an off-site server, but when they access any target after that they will be sent back to the local site (provided there are target servers there) and so there’s no performance impact.

If the 3 minute window too is unacceptable I suppose one could set the DFS Namespace service to manual start and use the Task Scheduler to start it manually after a few seconds after reboot.

Hope this helps anyone else.