Active Directory: Troubleshooting with DcDiag (part 2)

Continuing from here

LocatorCheck

  • Checks whether DCs have certain required knowledge/ ability. Specifically, whether the DC that’s tested knows of or can be a:
    • The Global Catalog (GC)
    • The Primary Domain Controller (PDC)
    • Kerberos Key Distribution Centre (KDC)
    • Time Server
    • Preferred Time Server
  • By itself the test doesn’t output much info:

    To get more details one has to use the /v switch. Then output similar to the following will be returned:

    Note that the DC itself needn’t be offering one of the servers. But it must know who else offers these and be able to refer. For instance, in the case of my domain WIN-DC03 (the server I am testing against) isn’t a GC or PDC so it returns WIN-DC01 as these. It is a time server, but is not a preferred time server (as that’s the forest root domain PDC), so the output is accordingly.

Intersite

  • Checks for failures that could affect Intersite replication.
  • Warning: By default the test silently skips doing anything and simply returns a success! Note the output below:

    As you can see from the verbose output the test actually does nothing.

  • To make the test do something one must specify the /a or /e switches (all DCs in the site or all DCs in the enterprise, respectively).

    Now WIN-DC02 is flagged as having issues. The /e will throw even more light:

    (In this case the router between the two sites was shutdown and so Intersite replication was failing. Hence the errors above.

  • This test doesn’t seem to force an Intersite replication. It only connects to the servers and checks for errors, I think. For instance, when I turned on the router above and verified the two DCs can see each other, forced an enterprise wide replication (repadmin /syncall win-dc01 /e /A) (tell WIN-DC01 to ask all its partners to replication, enterprise-wide, all NCs), and double checked the replication status (repadmin.exe /showrepl WIN-DC01) – everything was working fine, but the Intersite test still complains. Not the same errors as above, but different errors. The test passes but there are warnings that each site doesn’t have a Bridgehead yet because of errors. After about 15 mins the errors clears.
  • Intersite replication, Bridgeheads, and InterSite Topology Generators (ISTG) are part of later posts.

KccEvent

  • Checks whether the Knowledge Consistency Checker (KCC) has any errors. 
  • This test only checks the “Directory Services” event log of the specified server for any errors in the last 15 mins. (If you run the test with the /v switch it even says so). 

KnowsOfRoleHolders

  • Checks whether the DC knows of various Flexible Single Master Operations (FSMO) role holders in the domain. (FSMO is part of a later post so I won’t elaborate it here). 
  • By default the answer is just a pass or fail. 
  • Use with the /v switch to know what the DC thinks it knows: 
  • Good test to run after a role change to see whether all DCs in the domain/ enterprise know of the new role holder.

MachineAccount

  • Checks whether the DC’s machine account exists, is in the Domain Controllers OU, and Service Principal Names (SPNs) are correctly registered.
  • This is yet another test that only returns a pass or fail by default. Use with the /v switch to get a list of the registered SPNs.
  • Notice that the CheckSecurityError test also checks SPNs. CheckSecurityError is only run on demand, however.
  • Add the /RecreateMachineAccount switch to recreate the machine account if missing. Note: this does not recreate missing SPNs.
  • Add the /FixMachineAccount switch to fix if the machine account flags are incorrect (am not sure what flags these are …).
  • SPNs can be added/ modified/ deleted using the Setspn command.

NCSecDesc

  • Checks whether all the Naming Contexts on the DC have correct security permissions for replication.

NetLogons

  • Checks whether the Netlogon and SYSVOL shares are available and can be accessed.
  • I pointed out this test previously under the SysVolCheck test. The latter gives the impression it actually checks the SYSVOL shares, but it doesn’t. NetLogons is the one that checks.

ObjectsReplicated

  • Checks whether the DCs machine account and DSA objects have replicated. The DC machine account object is CN=,OU=Domain Controllers,... in the domain NC; the DSA object is CN=NTDS Settings,CN=,CN=Servers,CN=,... in the configuration NC.
  • This test is better run with the /a or /e switches. Without these switches it only checks the DC you test against to see whether it has its own objects. With the switches it checks all the objects for all DCs in the site/ enterprise on all DCs in the site/ enterprise. Which is what you really want.
  • It is also possible to check a specific object via the /objectdn: or limit to DCs holding a specific NC via the /n: switch.

    For example:

    Check all DCs holding the default naming context (rakhesh.local) across all sites:

    Check al DCs holding a specified application NC across all sites:

    I had created the SomeApp2 previously. It is only replicated to the WIN-DC01 and WIN-DC03 servers so the test above will only check those servers. (To recap: you can find the DCs a NC is replicated to from the ms-DS-NC-Replica-Locations attribute of its object in the Partitions container). Note that I had to specify a server above. That’s because without specifying a DC name there’s no way to identify which DCs know of this NC (Note: “know of”, its not necessary they hold the NC, they should only know where to point to). Unlike a domain NC which has DNS entries to help identify the DCs holding it, other NCs have no such mechanism. Below is the error you get if you don’t specify a DC name as above:

    Lastly, it’s also possible to check for the replication status of a specific object. Very useful for testing purposes. Make a test object on one DC, force a replication, wait some time, then test whether that object has replicated to all DCs in your site/ enterprise. (Sure you could connect to each DC via ADUC or ADSIEdit, but this is way more convenient!)

    Below command checks whether the specified user account has replicated to all DCs in the domain:

    I specify a NC above (the /n switch) because I am running DCDiag from a client so I must specify either a server to use (the /s switch) or a NC based on which a DC can be found. If run from a DC then the NC can be omitted.

OutboundSecureChannels

  • Checks whether all DCs in the domain (by default only those in the current site) have a secure channel to DCs in the trusted domain specified by the /testdomain: switch.
  • There seems to be a misunderstanding that this test checks secure channels between DCs of the same domain. That’s not the case, it’s between DCs of two trusted domains.
  • Use the /nositerestriction switch to not limit the test to all DCs in the same site.
  • This test is not run by default. It must be explicitly specified.

RegisterInDNS

  • Checks whether the server being tested can register “A” DNS records. The DNS domain name must be specified via the /DnsDomain: switch.
  • This test is similar to the DcPromo test mentioned previously.
  • This test isn’t run by default.

Replications

  • Checks whether all of the DCs replication partners are able to replicate to it. By default only those in the same site are tested.
  • It contacts each of the partners to get a status update from them. The test also checks whether there’s a replication latency of more than 12 hours.
  • Output from WIN-DC01 in my domain when I disconnected its partner WIN-DC03. WIN-DC02 is not checked as it’s in a different site.

RidManager

  • Checks whether the DC with the RID Master FSMO role is accessible and contains proper information. Use with the /v to get more details on the findings (allocation pool, next available, etc).
  • Example output:

Services

  • Checks whether various AD required services are running the DC.
  • Following services are tested:

    This list is similar (not same!) to the DC critical services list. Notably it doesn’t check if the “DNS Server” and “AD WS” services are running.

SystemLog

  • Checks the System Log for any errors in the last 60 mins (or less if the server uptime is less than 60 mins).

Topology

  • Checks whether the server has a fully connected topology for replication of each of its NCs.
  • Note that the test does not actually check if the servers in the topology are online/ connected. For that use the Replications and CutOffServers tests. This test only checks if the topology is logically fully connected.
  • This test is not run by default. It must be explicitly specified.

VerifyEnterpriseReferences

and

VerifyReferences

  • Checks whether system references required for the FRS and replication infrastructure are present on each DCs. The “Enterprise” variant tests whether references for replication to all DCs in the enterprise are present.
  • Note: I am not very clear what this test does (but feel free to look at Ned’s blog post for more info) and I have been writing this post over many days so I am too lazy to research further either. :) I’ll update this post later if I find more info on the test.
  • This test is not run by default. It must be explicitly specified.

VerifyReplicas

  • Checks whether all the application NCs have replicated to the DCs that should contain a copy.
  • Seems to be similar to the CheckSDRefDom test but more concerned with whether the DCs host a copy or not.
  • This test is not run by default. It must be explicitly specified.

That’s all! Phew! :)