Contact

Subscribe via Email

Subscribe via RSS

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Quickly get the last boot up time of a remote Windows machine

PowerShell:

Command Prompt/ WMI:

Double quotes are important for the WMI method.

WMI Access Denied for remote machine etc

This isn’t going to be a coherent post really (unlike my usual posts which are more coherent, I hope!). I came across a bunch of new stuff as I was troubleshooting this WMI issue and thought I should put them all somewhere.

The issue is that we are trying to get Solarwinds to monitor one of our DMZ servers via WMI but it keeps failing.

solarwinds

Other servers in the DMZ work, it’s just this one that fails. WMI ports aren’t fixed like I had mentioned earlier but I don’t think that matters coz they aren’t fixed for the other servers either. Firewall configuration is the same between both servers.

I thought of running Microsoft Network Monitor but realized that it’s been replaced with Microsoft Message Analyzer. That looks nice and seems to do a lot more than just network monitoring – I must explore it sometime. For now I ran it on the DMZ and applied a filter to see traffic from our Solarwinds server to it. The results showed that access was being denied, so may not a port issue after all.

analyzer message

Reading up more on this pointed me to a couple of suggestions. None of them helped but I’d like to mention them here for future reference.

First up is the command wbemtest. Run this from the local machine (the DMZ server in my case), click “Connect”, and then “Connect” again.

wbemtest1

If all is well with WMI you should get no error.

wbemtest2

Now try the same from the Solarwinds server, but this time try connecting to the DMZ server and enter credentials if any.

wbemtest3

That worked with the local administrator account but failed with the account I was using from Solarwinds to monitor the server. Error 0x80070005.

wbemtest4

So now I know the issue is one of permissions and not ports.

Another tool that can be used is WmiMgmt. Just type wmimgmt.msc somewhere to launch it. I tried this on the DMZ machine to confirm WMI works fine. I also tried it from SolarWinds machine to the DMZ machine. (Right click to connect to a remote computer).

wmimgmt

Problem with WmiMgmt is that unlike wbemtest you can’t a different account to use. If the two machines are domain joined or have the same local accounts, then it’s fine – you can run as from one machine and connect to the other – but otherwise there’s nothing much you can do. WmiMgmt is good to double check the permissions though. Click “Properties” in the screenshot above, go to the “Security” tab, select the “Root” namespace and click the “Security” button. The resulting window should show the permissions.

wmimgmt2

In my case the Administrators group members had full permissions as expected. The account I was using from the Solarwinds server was a member of this group too yet had access denied.

Another place to look at is Component Services > Computers > My Computer > DCOM Config > “Windows Management and Instrumentation” – right click and “Properties”.

componentservices

Make sure “Authentication Level” is “Default”. Then go to the “Security” tab and make sure the account/ group you want has permissions.

componentservices2

 

Also right click on “My Computer” and go to “Properties”.

componentservices3

Under the “COM Security” tab go to the “Edit Limits” of both access & launch and activation permissions and ensure the permissions are correct. My understanding is that the limits you specify here over-ride everything else.

componentservices4

In my case none of the above helped as they were all identical between both servers and at the correct setting. What finally helped was this Serverfault post.

Being DMZ servers these were on a workgroup and the Solarwinds server was connecting via a local account. Turns out that:

In a workgroup, the account connecting to the remote computer is a local user on that computer. Even if the account is in the Administrators group, UAC filtering means that a script runs as a standard user.

That is a good link to refer to. It is about Remote UAC and WMI. The solution is to go to the following registry key – HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System – and create a new name called LocalAccountTokenFilterPolicy (of type DWORD_32) with a value of 1. (Check the Serverfault post for one more solution if you are using a Server 2012).

Remote UAC. Ah! Am surprised I didn’t think of that in the beginning itself. If the LocalAccountTokenFilterPolicy has a value of 0 (the default) then whenever a member of the Administrators group connects remotely, the security tokens of that account and filtered to remove admin access. This is only for local admin accounts, not domain admin accounts – mind you. If LocalAccountTokenFilterPolicy has a value of 1, then no filtering happens and the account connects with full admin rights. I have encountered Remote UAC in the past when accessing admin shares (e.g. \\computer\drive$) in my home workgroup, never thought it would be playing a role here with WMI!

Hope this helps someone. :)

Notes on WMI ports & monitoring

Trying to set up monitoring for some of our Windows DMZ servers via SolarWinds and came across a few interesting links. At the same time I noticed that my carefully organized bookmarks folders seem to be corrupt. Many folders are empty. This happened a few days ago too, but that time it was just one folder (well one folder that I knew of, could be more who knows) and so I was able to view and older copy of my bookmarks via Xmarks and add the missing entries back.

But this time it’s a whole bunch of folders and the only option Xmarks has it to either export the older copy or overwrite your current copy with this older set. I don’t want the latter as that would mean losing all my newer bookmarks. Wish there was some way of merging the current and older copies! Anyhow, what’s happened is happened, I think I’ll stick to using this blog for bookmarks. I keep referring to this blog over my bookmarks anyway, so this is a sign to stop with the unnecessary filing.

To start off, this is a must read on WMI ports and how to allow firewall exceptions for WMI. Gist of the matter is that WMI uses dynamic ports via the RPC Portmapper. When the Solarwinds server (for example) wants to talk to WMI on a target server, it contacts the RPC Portmapper service on the target server on port 135 (which is the standard port for the Portmapper service) and gets a dynamic port to use for WMI. This port can be anywhere between 1024 – 65535.

The fix for this is to give the Portmapper service a specific set of ports to use. One method is to use the registry (see the previous link or this KB article). Add a key called Internet under HKEY_LOCAL_MACHINE\Software\Microsoft\Rpc. To this add values  Ports (MULTI_SZ), PortsInternetAvailable (REG_SZ), and UseInternetPorts (REG_SZ). Set a value of Y for the latter two, and a range like 5000-5100 to the former. Restart the server after this.

Although I haven’t tried it, I think a similar effect as the above can be achieved via Component Services (type dcomcnfg.exe in a command prompt). Expand the “Computers” folder here, right click on “My Computer”, go to “Default Protocols”, click “Properties” of “Conenction-oriented TCP/IP”, and add a port range.

dcomcnfg

Another method is to use Group Policies.

Yet another method seems to be to get WMI to not use the RPC Portmapper for dynamic ports. By default WMI runs as a shared service, which is why it uses the RPC Portmapper. It is possible to make it run as a standalone service so it doesn’t use the Portmapper and instead defaults to port 24158. (This port number too can be changed via dcomcnfg.exe but I am not sure how).


These two links didn’t make much sense to me, but I know they are of use so linking them here as a reference to myself for later:

 

Automatic Metric and Windows routing

IP routing involves metrics. This is the cost of each route. If there are multiple routes to a destination then the route with lowest metric/ cost is chosen.

In the context of Windows OS there are two metrics that come into play.

automatic metric

One is the metric of the interface/ NIC itself (that’s the “Automatic metric” checkbox above). By default its set to automatic, and this determines the cost of using that interface itself. For example if both your wireless and wired connection can access the Internet, which one should the machine choose? The interface metric is used to make this decision. You can assign a value to this metric if you want to force a decision.

Each interface can have multiple gateways to various networks it knows of. Could be that it has more than one gateway to the same network – say, your wired connection can connect to the Internet from two different routers on your network, which one should it choose? Here’s where the gateway metric comes into play (circled in the screenshot above). By default when you add a gateway its metric is set to automatic, but here too you can assign a value.

gateway metric

So far so good. Now how does all this come into play together?

The first thing to know is that gateway metrics have a value of 256 by default (when set to “Automatic metric”). So if you have more than one gateway to a particular destination, and the metric is set to automatic, then by default both gateways have a metric value of 256 and hence equal preference. Remember that.

The next thing to know is that interface metrics have a value ranging from 5 to 50 (when set to “Automatic metric”) based on the speed of the interface. Lower numbers are better than higher numbers. See this KB article for the numbers, here’s a screenshot from that article.

interface metrics

So if you have two wired connections for instance, one of speed 1 GB and other of speed 10 GB, then the 1 GB interface has a metric of 10 and the 10 GB interface has a metric of 5 – thus making the latter preferred.

To view the interface & gateway metrics assigned to your interfaces use the netsh interface ip show address command:

 

Notes on .NET (copy paste from other places)

Yes, just copy paste from other places so I can quickly refer to this post later than all those other posts. I don’t know much about .NET but had to read a bit about it today, so figured I might as well put some snippets here.

.NET Framework has two components:

  1. Common Language Runtime (CLR)
  2. .NET Framework Class Library

The CLR is like the foundation/ core of the .NET Framework. It “manages memory, thread execution, code execution, code safety verification, compilation, and other system services. These features are intrinsic to the managed code that runs on the common language runtime. Code that targets the runtime is known as managed code, while code that does not target the runtime is known as unmanaged code. The managed environment of the runtime eliminates many common software issues. For example, the runtime automatically handles object layout and manages references to objects, releasing them when they are no longer being used. This automatic memory management resolves the two most common application errors, memory leaks and invalid memory references. The runtime also accelerates developer productivity. For example, programmers can write applications in their development language of choice, yet take full advantage of the runtime, the class library, and components written in other languages by other developers. Any compiler vendor who chooses to target the runtime can do so. Language compilers that target the .NET Framework make the features of the .NET Framework available to existing code written in that language, greatly easing the migration process for existing applications.” (source)

The .NET Framework Class Library is “a collection of reusable types that tightly integrate with the common language runtime. The class library is object oriented, providing types from which your own managed code can derive functionality. For example, the .NET Framework collection classes implement a set of interfaces that you can use to develop your own collection classes. Your collection classes will blend seamlessly with the classes in the .NET Framework.” (source)

“Each version of the .NET Framework contains the common language runtime (CLR), the base class libraries, and other managed libraries. Each new version of the .NET Framework retains features from the previous versions and adds new features. The CLR is identified by its own version number. The .NET Framework version number is incremented at each release, although the CLR version is not always incremented. For example, the .NET Framework 4, 4.5, and later releases include CLR 4, but the .NET Framework 2.0, 3.0, and 3.5 include CLR 2.0. (There was no version 3 of the CLR.)” (source)

“In general, you should not uninstall any versions of the .NET Framework that are installed on your computer, because an application you use may depend on a specific version and may break if that version is removed. You can load multiple versions of the .NET Framework on a single computer at the same time. This means that you can install the .NET Framework without having uninstall previous versions.” (source)

“The .NET Framework 4.5 is an in-place update that replaces the .NET Framework 4 on your computer, and similarly, the .NET Framework 4.5.1 4.5.2, 4.6, 4.6.1, and 4.6.2 are in-place updates to the .NET Framework 4.5, which means that they use the same runtime version, but the assembly versions are updated and include new types and members. After you install one of these updates, your .NET Framework 4, .NET Framework 4.5, or .NET Framework 4.6 apps should continue to run without requiring recompilation. However, the reverse is not true. We do not recommend running apps that target a later version of the .NET Framework on the an earlier version of the .NET Framework. For example, we do not recommend that you run an app the targets the .NET Framework 4.6 on the .NET Framework 4.5.” (source)

How to determine which .NET Framework versions are installed – see here. Basically, check the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP registry subkey. The versions are listed as subkeys under this. In each of those subkeys an entry called Version has the version number.

NET Versions

If you have .NET 4.5 and above installed there will be an additional key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full. The Release entry in this key indicates the version of .NET Framework.

NET 45

In the screenshot above Release 379893 corresponds to .NET Framework 4.5.2.

Lastly, what is the .NET Multi-Targeting Pack? It was to learn more about it that I started reading about .NET today. Found this post about it but it mostly went over my head. :) Best I could understand is that it is used as part of compiling programs and installed as part of Visual Studio so doesn’t matter much from a Sys Admin point of view.

Notes on DFS referrals

Was brushing up about DFS referrals today as I had a doubt about something at work. Thought I’d put a shout out to this interesting link that I came across.

A DFS namespace (e.g. \\contoso\pub) has links (e.g. \\contoso\pub\documents). These links can point to multiple targets (e.g. \\server1\documents, \\server2\documents, and so on). When a client visits the link the target that’s chosen is the one in the same AD site as the client. If there is no target in the same site as the client then one of three situation can happen (you have to choose what happens per namespace, but can override it per link):

  • A list of targets from all sites is returned at random.
  • A list of targets is returned based on cost.
    • All sites in the domain will have a cost from the site the client is in. This cost is defined in “AD Sites & Services” and is cumulative (i.e. if Site A to Site B has cost 10, and Site B to Site C has cost 10, and there’s no explicit cost defined between Site A to Site C then the cost from Site A to Site C is taken as 10+10 = 20).
    • Targets from sites closest to the client site are listed in random, followed by targets from sites further away from the client site, and so on.
  • No targets are returned (this is also called in-site only).
    • So if there are no targets in the same site as the client, then the path fails.

Ordering

Apart from these three possibilities, there’s also a fail back (which is hidden behind the drop down in the screen shot above).

Failback

So if a server has no targets to offer a client, it will fail back to whatever targets are set as preferred for a link. I’ll show what preferred targets are in a bit. 

The above settings can be defined on the namespace itself or on each DFS link.

Link Ordering

Now on to preferred targets. If you go to the Properties > Advanced tab of each target, you can set its priority. That is to say, if a target is on same preference level as a bunch of other targets (because they are all in the same site or not) then you can set it to have a higher or lower priority.

Preferred Target

By default there are no preferred targets.

The cool thing I learnt from that post is that if the referral order is set to in-site (i.e. exclude targets from outside the client site) and fail back to preferred targets is enabled (the default) and a target outside the site is set as preferred, then it too will be returned in the list of targets along with the ones in site. This way you can limit referrals to be in-site but have a few selected targets out of site as a fail-back.

One thing to keep in mind though is that since you want the out of site target to be set to lower priority than the in-site one, you must specify its priority as “Last among all targets”. Because if it were set as “First among all targets” then it will take precedence over the in-site target too – which is not what we want. Lastly, there’s no point setting the priority to “First among targets of equal cost” (or “Last”) in the case of in-site referrals as it will have no effect (because the cost of the in-site target and the external targets are different so it doesn’t apply).

Use SetACL if you want to overcome the 260 character limit when setting ACLs

I had to set folder & file permissions (basically, take ownership and enable inheritance) for a bunch of Windows folders the other day. Thing is the folders had levels and levels of sub-folders so Windows Explorer kept failing when applying permissions. I tried to use takeown and icacls via the command prompt but these too kept failing.

One workaround I had in mind was use subst or make junctions but these didn’t work either. When I mapped part of the folder name to a drive letter using subst the command line tools kept complaining that it wasn’t a file system that supported ACLs. Junctions didn’t do that well either. Mainly coz once you map part of a folder to a path/ drive letter, there’s no way to select the multiple sub-folders in there and assign permissions – when you select multiple folders, the security tab is missing.

Anyhoo, long story short, I came across this Server Fault thread from where I learnt the following –

In the Windows API, there is an infamous constant known as MAX_PATH. MAX_PATH is 260 characters. The NTFS file system actually supports file paths of up to 32,767 characters. And you can still use 32,767 character long path names by accessing the Unicode (or “wide”) versions of the Windows API functions, and also by prefixing the path with \\?\.

MAX_PATH was set in stone a very long time ago in the Windows world. I think it has something to do with ANSI standards at the time… but it’s one of those things that’s very difficult for Microsoft to change now, as now we have thousands of programs and applications, including some written by Microsoft themselves, that use MAX_PATH and would fail in strange new ways if the constant were suddenly changed. (Buffer overflows, heap corruption, etc.)

See this MSDN page too.

So what I needed was a tool that made use of the Unicode versions of the Windows API functions. A quick Google search bought me to SetACL – an amazing command-line tool that is able to set ACLs without the path limitations and that also has a nice syntax (I don’t know why, but icacls and even PowerShell has such obscure syntax for setting file ACLs). Check out this example page to get started. In my case all I really needed to do was run a command like this to (1) enable permissions inheritance and (2) set the ownership, and the command would do it recursively for all the files and folders in that path. Amazing!

The only gotcha I encountered was that I got the following error message after a while with the above command:

SetACL error message: The call to SetNamedSecurityInfo () failed Operating system error message: Access is denied.

Thankfully a forum post from the SetACL forums sorted that out for me. Trick is to do the take ownership first, and then the permission inheritance – apparently doing both together causes the above error.

So I did this first:

Followed by this:

SetACL is free but but command-line oriented. If you want a GUI version there’s SetACL Studio. That’s a paid product with a 30-day trial. I haven’t tried it yet. There is a SetACL t-shirt I might buy coz I was quite pleased with this tool yesterday. :)

Get a list of OUs with inheritance blocked & GPOs not applied

To get a list of OUs and the status of GPO inheritance:

To get a list of OUs that have GPO inheritance blocked:

To get a list of OUs that have GPO inheritance blocked and a don’t have a particular GPO applied to them directly:

There’s probably a better way to do this, but this is the best I could come up with …

How to get the service name for sc

I need to enable/ disable the Windows Firewall on a Server 2008R2 core box but didn’t know what the Windows Firewall service name was for use with the sc command. Then I learnt it it has a sub-command called GetKeyName (and corresponding GetDisplayName, for the reverse operation) to get the name from the display name.

Nice!

Also, as a reminder to myself the sc config command is what you use to change the configuration of a service (make it disabled, manual, etc). When giving the options though be sure to include a space after the option. That is to say, the following works –

But the following won’t –

 

How to undo changes made by winrm quickconfig

Here’s what happens when you do a winrm quickconfig:

In my case the Windows Remote Management (WS-Management) service was already running, so its startup type was merely changed to “Automatic (Delayed)”, but if it wasn’t already running then it would have been started too.

So what all happens here?

  1. The service is started and type changed to “Automatic (Delayed)”.
  2. Starting the service in itself does not do anything as it does not listen for anything. So a listener is created. This listener listens for messages sent via HTTP on all IP addresses of the machine.
  3. A firewall exception is created for Windows Remote Management.
  4. A configuration change is made such that when a remote user connects with admin rights to this machine, the admin rights are not stripped via User Account Control (UAC). (See this & this blog post for what this means). Basically, this configuration change involves modifying a registry entry.

Thus, to undo the effect of winrm quickconfig one must undo each of these changes.

1. Disabling the service

Either go via the Services MMC console and (1) stop the service and (2) change its type to disabled; or use PowerShell (running as administrator of course):

That’s disabled.

2. Delete the listener

You can see the listener thus:

And delete it thus:

The command has no output, so enumerate the listeners again if you want to confirm.

3. Delete the firewall exceptions

Either go via the GUI and disable the highlighted rule:

winrm-firewall

Or use PowerShell:

That’s disabled.

4. Disable Remote UAC

Either open the Registry Editor and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System, then set the value of LocalAccountTokenFilterPolicy to 0 (zero).

Or via PowerShell:

That’s it!

Hyper-V between Windows 10 & Windows 8.1 in a workgroup

My laptop’s running Windows 10, desktop’s running Windows 8.1. Since both have client Hyper-V I thought it would be cool to install Hyper-V manager on the laptop and use it to manage Hyper-V running on the desktop. Did that and came across the following error –

Hyper-V error

DOGBERT is the Windows 8.1 desktop. The error is from my Windows 10 laptop.

First I followed the steps in this blog post. Actually, I didn’t have to do much as the account I was using on the desktop was already in the local Administrators group and so I didn’t have to do anything in terms of COM (step 3) & WMI (step 4) permissions. But I did enable the firewall rules for the Windows Management Instruction (WMI) group (step 2).

Additionally, I noticed that the Windows Remote Management (WS-Man) service was not running on the desktop so I enabled that. For this I used the winrm command.

 

Then I had to enable the Windows Remote Management (WS-Man) service on the laptop and add the desktop as a trusted host. Remember the error message above? It said that either I must use HTTPS or add the remote computer to the TrustedHosts list. I add that thus (from my laptop):

Probably a good idea to see what your existing trusted hosts are before you run this command (so you can append to the list instead of removing existing entries). You can do that thus:

After this Hyper-V manager from the laptop was able to connect to the desktop, but in the Virtual Machines section I had the following error:

Access denied. Unable to establish communication between ‘Hyper-V Server’ and ‘Hyper-V Manager’

The solution for that (thanks to this blog post) is to open “Component Services” on the laptop. Alternatively open a run window/ command prompt and type dcomcnfg.

In the windows that opens expand to Component Services > Computers > My Computer, right click and go to Properties, then the COM Security tab, and click “Edit Limits” under Access Permissions. Select the ANONYMOUS LOGIN username here and tick the box to allow Remote Access.

Component Services

That’s it! After this Hyper-V on my laptop was able to talk to the desktop.

Notes on NLB, VMware, etc

Just some notes to myself so I am clear about it while reading about it. In the context of this VMware KB article – Microsoft NLB not working properly in Unicast mode.

Before I get to the article I better talk about a regular scenario. Say you have a switch and it’s got a couple of devices connected to it. A switch is a layer 2 device – meaning, it has no knowledge of IP addresses and networks etc. All devices connected to a switch are in the same network. The devices on a switch use MAC addresses to communicate with each other. Yes, the devices have IPv4 (or IPv6) addresses but how they communicate to each other is via MAC addresses.

Say Server A (IPv4 address 10.136.21.12) wants to communicate with Server B (IPv4 address 10.136.21.22). Both are connected to the same switch, hence on the same LAN. Communication between them happens in layer 2. Here the machines identify each other via MAC addresses, so first Server A checks whether it knows the MAC address of Server B. If it knows (usually coz Server A has communicated with Server B recently and the MAC address is cached in its ARP table) then there’s nothing to do; but if it does not, then Server A finds the MAC address via something called ARP (Address Resolution Protocol). The way this works is that Server A broadcasts to the whole network that it wants the MAC address of the machine with IPv4 address 10.136.21.22 (the address of Server B). This message goes to the switch, the switch sends it to all the devices connected to it, Server B replies with its MAC address and that is sent to Server A. The two now communicate – I’ll come to that in a moment.

When it’s communication from devices in a different network to Server A or Server B, the idea is similar except that you have a router connected to the switch. The router receives traffic for a device on this network – it knows the IPv4 address – so it finds the MAC address similar to above and passes it to that device. Simple.

Now, how does the switch know which port a particular device is connected to. Say the switch gets traffic addresses to MAC address 00:eb:24:b2:05:ac – how does the switch know which port that is on? Here’s how that happens –

  • First the switch checks if it already has this information cached. Switches have a table called the CAM (Content Addressable Memory) table which holds this cached info.
  • Assuming the CAM table doesn’t have this info the switch will send the frame (containing the packets for the destination device) to all ports. Note, this is not like ARP where a question is sent asking for the device to respond; instead the frame is simply sent to all ports. It is broadcast to the whole network.
  • When a switch receives frames from a port it notes the source MAC address and port and that’s how it keeps the CAM table up to date. Thus when Server A sends data to Server B, the MAC address and switch port of Server A are stored in the switch’s CAM table.  This entry is only stored for a brief period.

Now let’s talk about NLB (Network Load Balancing).

Consider two machines – 10.136.21.11 with MAC address 00:eb:24:b2:05:ac and 10.136.21.12 with MAC address 00:eb:24:b2:05:ad. NLB is a form of load balancing wherein you create a Virtual IP (VIP) such as 10.136.21.10 such that any traffic to 10.136.21.10 is sent to either of 10.136.21.11 or 10.136.21.12. Thus you have the traffic being load balanced between the two machines; and not only that if any one of the machines go down, nothing is affected because the other machine can continue handling the traffic.

But now we have a problem. If we want a VIP 10.136.21.10 that should send traffic to either host, how will this work when it comes to MAC addresses? That depends on the type of NLB. There’s two sorts – Unicast and Multicast.

In Unicast the NIC that is used for clustering on each server has its MAC address changed to a new Unicast MAC address that’s the same for all hosts. Thus for example, the NIC that holds the NLB IP address 10.136.21.10 in the scenario above will have its MAC address changed from 00:eb:24:b2:05:ac and 00:eb:24:b2:05:ad respectively to (say) 00:eb:24:b2:05:af. Note that the MAC address is a Unicast MAC (which basically means the MAC address looks like a regular MAC address, such as that assigned to a single machine). Since this is a Unicast MAC address, and by definition it can only be assigned to one machine/ switch port, the NLB driver on each machines cheats a bit and changes the source MAC address address to whatever the original NIC MAC address was. That is to say –

  • Server IP 10.136.21.11
    • Has MAC address 00:eb:24:b2:05:ac
    • Which is changed to a MAC address of 00:eb:24:b2:05:af as part of the Unicast IP/ enabling NLB
    • However when traffic is sent out from this machine the MAC address is changed back to 00:eb:24:b2:05:ac
  • Same for Server 10.136.21.12

Why does this happen? This is because –

  • When a device wants to send data to the VIP address, it will try find the MAC address using ARP. That is, it sends a broadcast over the network asking for the device with this IP address to respond. Since both servers now have the same MAC address for their NLB NIC either server will respond with this common MAC address.
  • Now the switch receives frames for this MAC address. The switch does not have this in its CAM table so it will broadcast the frame to all ports – reaching either of the servers.
  • But why does outgoing traffic from either server change the MAC address of outgoing traffic? That’s because if outgoing frames have the common MAC address, then the switch will associate this common MAC address with that port – resulting in all future traffic to the common MAC address only going to one of the servers. By changing the outgoing frame MAC address back to the server’s original MAC address, the switch never gets to store the common MAC address in its CAM table and all frames for the common MAC address are always broadcast.

In the context of VMware what this means is that (a) the port group to which the NLB NICs connect to must allow changes to the MAC address and allow forged transmits; and (b) when a VM is powered on the port group by default notifies the physical switch of the VMs MAC address, since we want to avoid this because this will expose the cluster MAC address to the switch this notification too must be disabled. Without these changes NLB will not work in Unicast mode with VMware.

(This is a good post to read more about NLB).

Apart from Unicast NLB there’s also Multicast NLB. In this form the NLB NIC’s MAC address is not changed. Instead, a new Multicast MAC address is assigned to the NLB NIC. This is in addition to the regular MAC address of the NIC. The advantage of this method is that since each host retains its existing MAC address the communication between hosts is unaffected. However, since the new MAC address is a Multicast MAC address – and switches by default are set to ignore such address – some changes need to be done on the switch side to get Multicast NLB working.

One thing to keep in mind is that it’s important to add a default gateway address to your NLB NIC. At work, for instance, the NLB IPv4 address was reachable within the network but from across networks it wasn’t. Turns out that’s coz Windows 2008 onwards have a strong host behavior – traffic coming in via one NIC does not go out via a different NIC, even if both are in the same subnet and the second NIC has a default gateway set. In our case I added the same default gateway to the NLB NIC too and it was then reachable across networks. 

Windows DNS server subnet prioritization and round-robin

Consider the following multiple A records for a DNS record proxy.mydomain.com:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

These records are defined on a DNS server. When a client queries the DNS server for the address to proxy.mydomain.com, the DNS server returns all the addresses above. However, the order of answers returned keeps varying. The first client asking for answers could get them in the following order for instance:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

The second client could get them in the following order:

  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5
  • proxy.mydomain.com IN A 192.168.10.5

The third client could get:

  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5
  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5

This is called round-robin. Basically it rotates between the various IP addresses. All IP addresses are offered as answers, but the order is rotated so that as long as clients choose the first answer in the list every client chooses a different IP address.

Notice I said clients choose the first answer in the list. This needn’t always be the case though. When I said clients above, I meant the client computer that is querying the DNS server for an answer. But that’s not really who’s querying the server. Instead, an application on the client (e.g. Chrome, Internet Explorer) or the client OS itself is the one looking for an answer. These ask the DNS resolver which is usually a part of the OS for an answer, and it’s the resolver that actually queries the server and gets the list of answers above.

The DNS resolver can then return the list as it is to the requesting application, or it can apply a re-ordering of its own. For instance, if the client is from the 192.168.10.0 network, the resolver may re-order the answers such that the 192.168.10.5 answer is always first. This is called Subnet prioritization. Basically, the resolver prioritizes answers that are from the same subnet as the client. The idea being that client applications would prefer reaching out to a server in their same subnet (it’s closer to them, no need to go over the WAN link for instance) than one on a different subnet.

Subnet prioritization can be disabled on the resolver side by adding a registry key PrioritizeRecordData (link) with value 0 (REG_DWORD) at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DnsCache\Parameters. By default this key does not exist and has a default value of 1 (subnet prioritization enabled).

Subnet prioritization can also be set on the server side so it orders the responses based on the client network. This is controlled by the registry key LocalNetPriority (link) under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters\ on the DNS server. By default this is 0, so the server doesn’t do any subnet prioritization. Change this to 1 and the server will order its responses according to the client subnet.

By default the server also does round-robin for the results it returns. This can be turned off via the DNS Management tool (under server properties > advanced tab). If round-robin is off the server returns records in the order they were added.

More on subnet prioritization at this link.

That’s is not the end though. :)

Consider a server who has round-robin and subnet prioritization enabled. Now consider the DNS records from above:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

The first and last records are from class C networks. The other three are from Class A networks. In reality though thanks to CIDR these are all class C addresses.

Now say there’s a client with IP address 10.136.50.2/24 asking the server for answers. On the face of it the client network does not match any of the answer record networks so the server will simply return answers as per round-robin, without any re-ordering. But in reality though the client 10.136.50.2/24 is in the same network as 10.136.52.5/24 and both are part of a larger 10.136.48.0/20 network that’s simply been broken into multiple /24 networks (to denote clients, servers, etc). What can we do so the server correctly identifies the proxy record for this client?

This is where the LocalNetPriorityNetMask registry key under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters\ on the DNS server comes into play. This key – which does not exist by default – tells the server what subnet mask to assume when it’s trying to subnet prioritize. By default the server assumes a /24 subnet, but by tweaking this key we can tell the server to use a different subnet in its calculations and thus correctly return an answer.

The LocalNetPriorityNetMask key takes a REG_DWORD value in a hex format. Check out this KB article for more info, but a quick run through:

A netmask can be written as xxx.xxx.xxx.xxx. 4 pairs of numbers. The LocalNetPriorityNetMask key is of format 0xaabbccdd – again, 4 pairs of hex numbers. This is a mask that’s applied on the mask of 255.255.255.255 so to calculate this number you subtract the mask you want from 255.255.255.255 and convert the resulting numbers into hex.

For example: you want a /8 netmask. That is 255.0.0.0. Subtracting this from 255.255.255.255 leaves you with 0.255.255.255.255. What’s that in hex? 00ffffff. So LocalNetPriorityNetMask will be 0x00ffffff. Easy?

So in the example above I want a /20 netmask. That is, I am telling the server to assume the clients and the record IPs it has to be in a /20 network, so subnet prioritize accordingly. A /20 netmask is 255.255.240.0. Subtract from 255.255.255.255 to get 0.0.15.255. Which in hex is 00000fff (15 decimal is F hex). So all I have to do is put this value as LocalNetPriorityNetMask on the DNS server, restart the service, and now the server will correctly return subnet prioritized answers for my /20 network.

Update: Some more links as I did some more reading on this topic later.

  • Ace Fekay’s post – a must read!
  • A subnet calculator (also gives you the wildcard, which you can use for calculating the LocalNetPriorityNetMask key)
  • I am not very clear on what happens if you disable RoundRobin but there are multiple entries from the same subnet. What order are they returned in? Here’s a link to the RoundRobin setting, doesn’t explain much but just linking it in case it helps in the future.
  • More as a note to myself (and any others if they wonder the same) – the subnet mask you specify is applied on the client. That is to say if you client has an IP address of say 10.136.20.10, by default the DNS server will assume a subnet mask of /24 (Class C is the default) and assume the client is in a 10.136.20.0/24 network. So any records from that range are prioritized. If you want to include other records, you specify a larger subnet mask. Thus, for example, if you specify a /20 then the client is assumed to have an IP address 10.136.20.10/20, so its network range is considered to be 10.136.16.1 – 10.136.31.254 (don’t wrack your brain – use the subnet calculator for this). So any record in this range is prioritized over records not in this range.
  • The Windows calculator can be used to find the LocalNetPriorityNetMask key value. Say you want a subnet mask of /19. The subnet calculator will tell you this has a wildcard of 0.0.31.255 – i.e. 00011111.11111111. Put this (13 1’s) into the Windows calculator to get the hex value 3FFF.  

AppV – Empty package map for package content root

Had an interesting problem at work yesterday about which I wish I could write a long and interesting blog post, but truthfully it was such a simple thing once I identified the cause.

We use AppV for streaming applications. We have many branch offices so there’s a DFS share which points to targets in each office. AppV installations in each office point to this DFS share and thanks to the magic of DFS referrals correctly pick up the local Content folder. From day-before, however, one of our offices started getting errors with AppV apps (same as in this post), and when I checked the AppV server I found errors similar to this in the Event Logs:

The DFS share seemed to be working OK. I could open it via File Explorer and its contents seemed correct. I checked the number of files and the size of the share and they matched across offices. If I pointed the DFS share to use a different target (open the share in File Explorer, right click, Properties, go to the DFS tab and select a different location target) AppV works. So the problem definitely looked like something to do with the local target, but what was wrong?

I tried forcing a replication. And checked permissions and used tools like dfsrdiag to confirm things were alright. No issues anywhere. Restarting the DFS Replication service on the server threw up some errors in the Event Logs about some AD objects, so I spent some time chasing up that tree (looks like older replication groups that were still hanging around in AD with missing info but not present in the DFS Management console any more) until I realized all the replication servers were throwing similar errors. Moreover, adding a test folder to the source DFS share correctly resulted it in appearing on the local target immediately – so obviously replication was working correctly.

I also used robocopy to compare the the local target and another one and saw that they were identical.

Bummer. Looked like a dead end and I left it for a while.

Later, while sitting through a boring conference call I had a brainwave that maybe the AppV service runs in a different user context and that may not be seeing the DFS share? As in, maybe the error message above is literally what is happening. AppV is really seeing an empty content root and it’s not a case of a corrupt content root or just some missing files?

So I checked the AppV service and saw that it runs as NT AUTHORITY\NETWORK SERVICE. Ah ha! That means it authenticates with the remote server with the machine account of the server AppV is running on. I thought I’d verify what happens by launching File Explorer or a Command Prompt as NT AUTHORITY\NETWORK SERVICE but this was a Server 2003 and apparently there’s no straightforward way to do that. (You can use psexec to launch something as .\LOCALSYSTEM and starting from Server 2008 you can create a scheduled task that runs as NT AUTHORITY\NETWORK SERVICE and launch that to get what you want but I couldn’t use that here; also, I think you need to first run as the .\LOCALSYSTEM account and then run as the NT AUTHORITY\NETWORK SERVICE account). So I checked the Audit logs of the server hosting the DFS target and sure enough found errors that the machine account of the AppV server was indeed being denied login:

Awesome! Now we are getting somewhere.

I fired up the Local Security Policy console on the server hosting the DFS target (it’s under the Administrative Tools folder, or just type secpol.msc). Then went down to “Local Policies” > “User Rights Assignment” > “Access this computer from the Network”:

secpolSure enough this was limited to a set of computers which didn’t include the AppV server. When I compared this with our DFS servers I saw that they were still on the default values (which includes “Everyone” as in the screenshot above) and that’s why those targets worked.

To dig further I used gpresult and compared the GPOs that affected the above policy between both servers. The server that was affected had this policy modified via  GPO while the server that wasn’t affected showed the GPO as inaccessible. Both servers were in the same OU but upon examining the GPO I saw that it was limited to a certain group only. Nice! And when I checked that group our problem server was a member of it while the rest weren’t! :)

Turns out the server was added to the group by error two days ago. Removed the server from this group, waited a while for the change across the domain, did a gpupdate on the server, and tada! now the AppV server is able to access the DFS share on this local target again. Yay!

Moral of the story: if one of your services is unable to access a shared folder, check what user account the service runs as.

Brief notes on Windows Time

The w32time service provides time for Windows. Since Windows XP NTP (Network Time Protocol) is supported. Prior to that it was only SNTP (Simple NTP).

Non domain joined computers (including servers) use SNTP.

This is a good article that explains the Windows Time service and its configurations. Covers both registry keys and GPOs. This is another good article that goes into even more detail.

Any Windows machine can be set up to sync time in one of four ways: (1) no syncing! (2) sync from specified NTP servers (3) sync via domain hierarchy (i.e. members sync from a DC in the domain; DCs sync from PDC of the parent domain/ forest root domain) (4) use either of the above (i.e. NTP and domain hierarchy). Default mechanism on domain joined computers is domain hierarchy (the setting is called NT5DS). Stand-alone machines have the default as NTP servers (the setting is called NTP; the default server is time.windows.com though you can change it (and probably recommended that you change it?)).

For machines that are off and on the domain – e.g. laptops – it is better to set their time sync mechanism as any. They needn’t always have contact with the DC to sync time.

When specifying NTP time servers you also specify flags. Check this post for an explanation of the flags. There are four possible flags: 0x01 SpecialInterval; 0x02 UseAsFallbackOnly; 0x04 SymmetricActive; 0x08 Client.

  • Flag UseAsFallbackOnly means the server is only used if the others are unavailable. Check out this post for an example of this.
  • Flag SpecialInterval lets you change how often the NTP server is polled. By default the interval is determined by Windows based on the quality of time samples, but you can use the above flag and set a registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\NtpClient\SpecialPollInterval to change the polling interval.
  • I am not sure what the other two flags do. The Client flag seems to be a commonly used one. Some posts/ articles use it, others don’t. The default time.windows.com setting uses this flag as well as the SpecialInterval.

p.s. To turn on w32tm debugging check out this link.