Contact

Subscribe via Email

Subscribe via RSS

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

How to get the service name for sc

I need to enable/ disable the Windows Firewall on a Server 2008R2 core box but didn’t know what the Windows Firewall service name was for use with the sc command. Then I learnt it it has a sub-command called GetKeyName (and corresponding GetDisplayName, for the reverse operation) to get the name from the display name.

Nice!

Also, as a reminder to myself the sc config command is what you use to change the configuration of a service (make it disabled, manual, etc). When giving the options though be sure to include a space after the option. That is to say, the following works –

But the following won’t –

 

How to undo changes made by winrm quickconfig

Here’s what happens when you do a winrm quickconfig:

In my case the Windows Remote Management (WS-Management) service was already running, so its startup type was merely changed to “Automatic (Delayed)”, but if it wasn’t already running then it would have been started too.

So what all happens here?

  1. The service is started and type changed to “Automatic (Delayed)”.
  2. Starting the service in itself does not do anything as it does not listen for anything. So a listener is created. This listener listens for messages sent via HTTP on all IP addresses of the machine.
  3. A firewall exception is created for Windows Remote Management.
  4. A configuration change is made such that when a remote user connects with admin rights to this machine, the admin rights are not stripped via User Account Control (UAC). (See this & this blog post for what this means). Basically, this configuration change involves modifying a registry entry.

Thus, to undo the effect of winrm quickconfig one must undo each of these changes.

1. Disabling the service

Either go via the Services MMC console and (1) stop the service and (2) change its type to disabled; or use PowerShell (running as administrator of course):

That’s disabled.

2. Delete the listener

You can see the listener thus:

And delete it thus:

The command has no output, so enumerate the listeners again if you want to confirm.

3. Delete the firewall exceptions

Either go via the GUI and disable the highlighted rule:

winrm-firewall

Or use PowerShell:

That’s disabled.

4. Disable Remote UAC

Either open the Registry Editor and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System, then set the value of LocalAccountTokenFilterPolicy to 0 (zero).

Or via PowerShell:

That’s it!

Hyper-V between Windows 10 & Windows 8.1 in a workgroup

My laptop’s running Windows 10, desktop’s running Windows 8.1. Since both have client Hyper-V I thought it would be cool to install Hyper-V manager on the laptop and use it to manage Hyper-V running on the desktop. Did that and came across the following error –

Hyper-V error

DOGBERT is the Windows 8.1 desktop. The error is from my Windows 10 laptop.

First I followed the steps in this blog post. Actually, I didn’t have to do much as the account I was using on the desktop was already in the local Administrators group and so I didn’t have to do anything in terms of COM (step 3) & WMI (step 4) permissions. But I did enable the firewall rules for the Windows Management Instruction (WMI) group (step 2).

Additionally, I noticed that the Windows Remote Management (WS-Man) service was not running on the desktop so I enabled that. For this I used the winrm command.

 

Then I had to enable the Windows Remote Management (WS-Man) service on the laptop and add the desktop as a trusted host. Remember the error message above? It said that either I must use HTTPS or add the remote computer to the TrustedHosts list. I add that thus (from my laptop):

Probably a good idea to see what your existing trusted hosts are before you run this command (so you can append to the list instead of removing existing entries). You can do that thus:

After this Hyper-V manager from the laptop was able to connect to the desktop, but in the Virtual Machines section I had the following error:

Access denied. Unable to establish communication between ‘Hyper-V Server’ and ‘Hyper-V Manager’

The solution for that (thanks to this blog post) is to open “Component Services” on the laptop. Alternatively open a run window/ command prompt and type dcomcnfg.

In the windows that opens expand to Component Services > Computers > My Computer, right click and go to Properties, then the COM Security tab, and click “Edit Limits” under Access Permissions. Select the ANONYMOUS LOGIN username here and tick the box to allow Remote Access.

Component Services

That’s it! After this Hyper-V on my laptop was able to talk to the desktop.

Hello again!

Been a while since I blogged here. Nearly 3 months … phooey!

I’ve been lazy. Plus busy at work. And doing less following around with stuff as I used to do before … all that led to a lack of posts here. Hopefully I get to posting with more regularity again.

Logged in today after a long while and update WordPress to the latest version along with all its plugins.

Notes on NLB, VMware, etc

Just some notes to myself so I am clear about it while reading about it. In the context of this VMware KB article – Microsoft NLB not working properly in Unicast mode.

Before I get to the article I better talk about a regular scenario. Say you have a switch and it’s got a couple of devices connected to it. A switch is a layer 2 device – meaning, it has no knowledge of IP addresses and networks etc. All devices connected to a switch are in the same network. The devices on a switch use MAC addresses to communicate with each other. Yes, the devices have IPv4 (or IPv6) addresses but how they communicate to each other is via MAC addresses.

Say Server A (IPv4 address 10.136.21.12) wants to communicate with Server B (IPv4 address 10.136.21.22). Both are connected to the same switch, hence on the same LAN. Communication between them happens in layer 2. Here the machines identify each other via MAC addresses, so first Server A checks whether it knows the MAC address of Server B. If it knows (usually coz Server A has communicated with Server B recently and the MAC address is cached in its ARP table) then there’s nothing to do; but if it does not, then Server A finds the MAC address via something called ARP (Address Resolution Protocol). The way this works is that Server A broadcasts to the whole network that it wants the MAC address of the machine with IPv4 address 10.136.21.22 (the address of Server B). This message goes to the switch, the switch sends it to all the devices connected to it, Server B replies with its MAC address and that is sent to Server A. The two now communicate – I’ll come to that in a moment.

When it’s communication from devices in a different network to Server A or Server B, the idea is similar except that you have a router connected to the switch. The router receives traffic for a device on this network – it knows the IPv4 address – so it finds the MAC address similar to above and passes it to that device. Simple.

Now, how does the switch know which port a particular device is connected to. Say the switch gets traffic addresses to MAC address 00:eb:24:b2:05:ac – how does the switch know which port that is on? Here’s how that happens –

  • First the switch checks if it already has this information cached. Switches have a table called the CAM (Content Addressable Memory) table which holds this cached info.
  • Assuming the CAM table doesn’t have this info the switch will send the frame (containing the packets for the destination device) to all ports. Note, this is not like ARP where a question is sent asking for the device to respond; instead the frame is simply sent to all ports. It is broadcast to the whole network.
  • When a switch receives frames from a port it notes the source MAC address and port and that’s how it keeps the CAM table up to date. Thus when Server A sends data to Server B, the MAC address and switch port of Server A are stored in the switch’s CAM table.  This entry is only stored for a brief period.

Now let’s talk about NLB (Network Load Balancing).

Consider two machines – 10.136.21.11 with MAC address 00:eb:24:b2:05:ac and 10.136.21.12 with MAC address 00:eb:24:b2:05:ad. NLB is a form of load balancing wherein you create a Virtual IP (VIP) such as 10.136.21.10 such that any traffic to 10.136.21.10 is sent to either of 10.136.21.11 or 10.136.21.12. Thus you have the traffic being load balanced between the two machines; and not only that if any one of the machines go down, nothing is affected because the other machine can continue handling the traffic.

But now we have a problem. If we want a VIP 10.136.21.10 that should send traffic to either host, how will this work when it comes to MAC addresses? That depends on the type of NLB. There’s two sorts – Unicast and Multicast.

In Unicast the NIC that is used for clustering on each server has its MAC address changed to a new Unicast MAC address that’s the same for all hosts. Thus for example, the NIC that holds the NLB IP address 10.136.21.10 in the scenario above will have its MAC address changed from 00:eb:24:b2:05:ac and 00:eb:24:b2:05:ad respectively to (say) 00:eb:24:b2:05:af. Note that the MAC address is a Unicast MAC (which basically means the MAC address looks like a regular MAC address, such as that assigned to a single machine). Since this is a Unicast MAC address, and by definition it can only be assigned to one machine/ switch port, the NLB driver on each machines cheats a bit and changes the source MAC address address to whatever the original NIC MAC address was. That is to say –

  • Server IP 10.136.21.11
    • Has MAC address 00:eb:24:b2:05:ac
    • Which is changed to a MAC address of 00:eb:24:b2:05:af as part of the Unicast IP/ enabling NLB
    • However when traffic is sent out from this machine the MAC address is changed back to 00:eb:24:b2:05:ac
  • Same for Server 10.136.21.12

Why does this happen? This is because –

  • When a device wants to send data to the VIP address, it will try find the MAC address using ARP. That is, it sends a broadcast over the network asking for the device with this IP address to respond. Since both servers now have the same MAC address for their NLB NIC either server will respond with this common MAC address.
  • Now the switch receives frames for this MAC address. The switch does not have this in its CAM table so it will broadcast the frame to all ports – reaching either of the servers.
  • But why does outgoing traffic from either server change the MAC address of outgoing traffic? That’s because if outgoing frames have the common MAC address, then the switch will associate this common MAC address with that port – resulting in all future traffic to the common MAC address only going to one of the servers. By changing the outgoing frame MAC address back to the server’s original MAC address, the switch never gets to store the common MAC address in its CAM table and all frames for the common MAC address are always broadcast.

In the context of VMware what this means is that (a) the port group to which the NLB NICs connect to must allow changes to the MAC address and allow forged transmits; and (b) when a VM is powered on the port group by default notifies the physical switch of the VMs MAC address, since we want to avoid this because this will expose the cluster MAC address to the switch this notification too must be disabled. Without these changes NLB will not work in Unicast mode with VMware.

(This is a good post to read more about NLB).

Apart from Unicast NLB there’s also Multicast NLB. In this form the NLB NIC’s MAC address is not changed. Instead, a new Multicast MAC address is assigned to the NLB NIC. This is in addition to the regular MAC address of the NIC. The advantage of this method is that since each host retains its existing MAC address the communication between hosts is unaffected. However, since the new MAC address is a Multicast MAC address – and switches by default are set to ignore such address – some changes need to be done on the switch side to get Multicast NLB working.

One thing to keep in mind is that it’s important to add a default gateway address to your NLB NIC. At work, for instance, the NLB IPv4 address was reachable within the network but from across networks it wasn’t. Turns out that’s coz Windows 2008 onwards have a strong host behavior – traffic coming in via one NIC does not go out via a different NIC, even if both are in the same subnet and the second NIC has a default gateway set. In our case I added the same default gateway to the NLB NIC too and it was then reachable across networks. 

User PowerShell/ PowerCLI to get VM space usage

Wanted to get the space used by all VMs across a bunch of our newer hosts –

There’s probably a way to show the total too but I used a separate pipeline for that –

 

HP DL360 Gen9 with HP FlexFabric 534 adapter and HP Ethernet 530 adapter and ESXi

That’s a very vague subject line, I know, but I couldn’t think of anything concise. Just wanted to put some keywords so that if anyone else comes across the same problem and types something similar into Google hopefully they stumble upon this post.

At work we got some HP DL360 Gen9s to use as ESXi hosts. To these servers we added additional network cards –

  • HP FlexFabric 10Gb 2-port 534FLR-SFP+ Adapter; and
  • HP Ethernet 10Gb 2-port 530SFP+ Adapter.

Each of these adapters have two NICs each. Here’s a picture of the adapters in the server and the vmnic numbers ESXi assigns to them.

serverIn this picture –

  • vmnic5 & vmnic4 are the HP FlexFabric 10Gb 2-port 534FLR-SFP+ Adapter;
  • vmnic6 & vmnic7 are the HP Ethernet 10Gb 2-port 530SFP+ Adapter; and
  • vmnic0 – vmnic3 are HP Ethernet 1Gb 4-port 331i Adapter (which come in-built into the server);
  • iLO is the iLO port (which I’ll ignore for now).

We didn’t want to use vmnic0 – vmnic3 as they are only 1Gb. So the idea was the use vmnic4 – vmnic7. Two NICs would be for Management+vMotion (connecting to two different switches); two NICs would be for iSCSI (again connecting to different switches).

We came across two issues. First was that the FlexFabric NICs didn’t seem to support iSCSI. ESXi showed two iSCSI adapters but the NICs mapped to them were the regular Ethernet 10Gb ones, not the FlexFabric 10Gb ones. Second issue was that we wanted to use vmnic4 and vmnic6 for Management+vMotion and vmnic5 and vmnic7 for iSCSI – basically a NIC from each adapter such that even if an adapter were to fail there’s a NIC from another adapter for resiliency. This didn’t work for some reason. The Ethernet 10Gb NICs weren’t “connecting” to the network switch for some reason. They would connect in the sense that the link status appears as connected and the LEDs on the switch and NICs blink, but something was missing. There was no real connectivity.

Here’s what we did to fix these.

But first, for both these fixes you have to reboot the server and go into the System Utilities menu.

f9 system utils

Change 1: Enable iSCSI on the FlexFabric adapter (vmnic4 and vmnic5)

Once in the System Utilities menu select “System Configuration”.

system configurationSelect the first FlexFabric NIC (port1).

select flexfabricThen select the Device Hardware Configuration menu.

select device hardwareYou will see that the storage personality is FCoE.

current flex personalityThat’s the problem. This is why the FlexFabric adapters don’t show up as iSCSI adapters. Select the FCoE entry and change it to iSCSI.

new flex personalityNow press Esc to go back to the previous menus (you will be prompted to save the changes – do so). Then repeat the above steps for the second FlexFabric NIC (port 2).

With this change the FlexFabric NICs will appear as iSCSI adapters. Now for the second change.

Change 2: Enable DCB for the Ethernet adapters

From the System Configuration menu now select the first Ethernet NIC (port 1).

select ethernetThen select its Device Hardware Configuration menu.

select device hardware (ethernet)Notice the entry for “DCB Protocol”. Most likely it is “Disabled” (which is why the NICs don’t work for you).

current DCBChange that to “Enabled” and now the NICs will work.

new DCBThat’s it. Once again press Esc (choosing to save the changes when prompted) and then reboot the system. Now all the NICs will work as expected and appear as iSCSI adapters too.

rebootI have no idea what DCB does. From what I can glean via Google it seems to be a set of extensions to Ethernet that provide “hardware-based bandwidth allocation to a specific type of traffic and enhances Ethernet transport reliability with the use of priority-based flow control” (via TechNet) (also check out this Cisco whitepaper for more info). I didn’t read much into it because I couldn’t find anything that mentioned why DCB mattered in this case – as in why were the NICs not working when DCB was disabled? The NICs are connected to an HP 5920AF switch but I couldn’t find anything that suggested the switch requires DCB enabled for the ports to work. This switch supports DCB but that doesn’t imply it requires DCB.

Anyhow, the FlexFabric adapters have DCB enabled by default which is probably why they worked. That’s how I got the idea to enable DCB on the Ethernet adapters to see if it makes a difference – and it did! The only thing I can think of is that DCB also seems to include a DCBX (Data Centre Bridging Exchange) protocol which is about discovering peers, discovering mismatched configuration etc – so maybe the fact that DCB was disabled on these adapters made the switch not “see” these NICs and soft-disable them somehow. That’s my guess at least.

Windows DNS server subnet prioritization and round-robin

Consider the following multiple A records for a DNS record proxy.mydomain.com:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

These records are defined on a DNS server. When a client queries the DNS server for the address to proxy.mydomain.com, the DNS server returns all the addresses above. However, the order of answers returned keeps varying. The first client asking for answers could get them in the following order for instance:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

The second client could get them in the following order:

  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5
  • proxy.mydomain.com IN A 192.168.10.5

The third client could get:

  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5
  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5

This is called round-robin. Basically it rotates between the various IP addresses. All IP addresses are offered as answers, but the order is rotated so that as long as clients choose the first answer in the list every client chooses a different IP address.

Notice I said clients choose the first answer in the list. This needn’t always be the case though. When I said clients above, I meant the client computer that is querying the DNS server for an answer. But that’s not really who’s querying the server. Instead, an application on the client (e.g. Chrome, Internet Explorer) or the client OS itself is the one looking for an answer. These ask the DNS resolver which is usually a part of the OS for an answer, and it’s the resolver that actually queries the server and gets the list of answers above.

The DNS resolver can then return the list as it is to the requesting application, or it can apply a re-ordering of its own. For instance, if the client is from the 192.168.10.0 network, the resolver may re-order the answers such that the 192.168.10.5 answer is always first. This is called Subnet prioritization. Basically, the resolver prioritizes answers that are from the same subnet as the client. The idea being that client applications would prefer reaching out to a server in their same subnet (it’s closer to them, no need to go over the WAN link for instance) than one on a different subnet.

Subnet prioritization can be disabled on the resolver side by adding a registry key PrioritizeRecordData (link) with value 0 (REG_DWORD) at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DnsCache\Parameters. By default this key does not exist and has a default value of 1 (subnet prioritization enabled).

Subnet prioritization can also be set on the server side so it orders the responses based on the client network. This is controlled by the registry key LocalNetPriority (link) under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters\ on the DNS server. By default this is 0, so the server doesn’t do any subnet prioritization. Change this to 1 and the server will order its responses according to the client subnet.

By default the server also does round-robin for the results it returns. This can be turned off via the DNS Management tool (under server properties > advanced tab). If round-robin is off the server returns records in the order they were added.

More on subnet prioritization at this link.

That’s is not the end though. :)

Consider a server who has round-robin and subnet prioritization enabled. Now consider the DNS records from above:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

The first and last records are from class C networks. The other three are from Class A networks. In reality though thanks to CIDR these are all class C addresses.

Now say there’s a client with IP address 10.136.50.2/24 asking the server for answers. On the face of it the client network does not match any of the answer record networks so the server will simply return answers as per round-robin, without any re-ordering. But in reality though the client 10.136.50.2/24 is in the same network as 10.136.52.5/24 and both are part of a larger 10.136.48.0/20 network that’s simply been broken into multiple /24 networks (to denote clients, servers, etc). What can we do so the server correctly identifies the proxy record for this client?

This is where the LocalNetPriorityNetMask registry key under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters\ on the DNS server comes into play. This key – which does not exist by default – tells the server what subnet mask to assume when it’s trying to subnet prioritize. By default the server assumes a /24 subnet, but by tweaking this key we can tell the server to use a different subnet in its calculations and thus correctly return an answer.

The LocalNetPriorityNetMask key takes a REG_DWORD value in a hex format. Check out this KB article for more info, but a quick run through:

A netmask can be written as xxx.xxx.xxx.xxx. 4 pairs of numbers. The LocalNetPriorityNetMask key is of format 0xaabbccdd – again, 4 pairs of hex numbers. This is a mask that’s applied on the mask of 255.255.255.255 so to calculate this number you subtract the mask you want from 255.255.255.255 and convert the resulting numbers into hex.

For example: you want a /8 netmask. That is 255.0.0.0. Subtracting this from 255.255.255.255 leaves you with 0.255.255.255.255. What’s that in hex? 00ffffff. So LocalNetPriorityNetMask will be 0x00ffffff. Easy?

So in the example above I want a /20 netmask. That is, I am telling the server to assume the clients and the record IPs it has to be in a /20 network, so subnet prioritize accordingly. A /20 netmask is 255.255.240.0. Subtract from 255.255.255.255 to get 0.0.15.255. Which in hex is 00000fff (15 decimal is F hex). So all I have to do is put this value as LocalNetPriorityNetMask on the DNS server, restart the service, and now the server will correctly return subnet prioritized answers for my /20 network.

It is possible to vMotion VMs across ESX hosts without shared storage

Today (well actually, a few days ago; but today is when I read more about it) I learnt that you can vMotion VMs across hosts without shared storage.

This is only for vSphere 5.1 and above. That’s a pretty cool feature, especially because at work we are migrating all our VMs to new hosts & storage and one of things we were wondering about was how to move the VMs across. The new hosts have 3Par storage while the old hosts have StoreVirtual storage, so the thinking was that we’d probably have to give the new hosts access to the StoreVirtual storage and then do a vMotion. Now we won’t have to!

There’s no separate name for this sort of vMotion and it seems to be a not quite hyped feature. For anyone interested here’s some screenshots on how to do such a vMotion.

For starters here’s my testlab setup:

setupOne datacenter. Two clusters. Cluster one has two hosts with shared storage. Cluster two has a single host with no shared storage. UBUNTU1 is a VM I would like to migrate over.

Note that host esx03 has no connectivity to the shared storage either. I have removed the iSCSI VMkernel mappings from it so there’s no confusion.

esx03 shared storageESX01 and ESX02 have access to shared storage.

esx01 shared storageMigration is quite simple. Right click the VM and select Migrate. Choose the option to migrate both host and datastore. If the VM is powered on (which it would be as we are doing vMotion instead of a cold migration) you will see the option is grayed out in the older/ C# vSphere client.

migrate host and datastore - 1That’s because the newer features of vSphere 5.1 are only available in the web client so you’ll have to use that instead (thanks to this blog post for pointing me to that).

migrate host and datastore - 2Select the destination host. Note that vMotion is only between datacenters so you can only chose a host in the same datacenter (as opposed to cold migration which can happen between datacenters).

select destination

Select Datacenter

select destination host

Select Host

Select Datastore

Select Datastore

Notice that any datastore accessible from the destination host can be selected.

And that’s it. vMotion begins and I have easily live migrated a VM from one host to another without any shared storage. Cool! :)

setup2

Get a list of users in an OU along with last logged on date

Trivial stuff. Wanted to note it down someplace for future reference –

 

Start-BitsTransfer does nothing

I had to copy some VMware templates from our head office to the branch offices. Thought I’d copy them out of the datastores manually, then do a BITS transfer to the remote offices. This way I can do the transfer during normal hours but with minimal user impact.

Since PowerShell 2.0 you had the Start-BitsTransfer cmdlet to do BITS transfers.

Oddly however, the command would just exit without any error for me. And it didn’t seem to be doing anything. Then I realized I was pointing the cmdlet to my source folder and that’s why it was failing! Start-BitsTransfer only takes files. You can specify wildcards to select multiple files but you can’t point it to a folder.

So the following doesn’t work:

But this works:

Since Start-BitsTransfer supports wildcards it’s mostly fine unless your folder contains sub-folders and you want to copy these and/ or preserve the structure. Fortunately this is PowerShell so it’s just a matter of creating some wrappers around the cmdlet to support folders too. Like this one for instance. Or use CSV files like in this MSDN article.

One thing to keep in mind – by default Start-BitsTransfer has a default priority (specified via the -Priority switch) of Foreground. This competes with other applications so is probably not what you want. You alternatives are High, Normal, or Low – each of which does the transfer in the background and uses the idle bandwidth of the client for transfer (the priority determines which transfer job gets priority over the other similar BITS transfers).

Another thing to keep in mind is that BITS only really looks at the bandwidth availability of the client (when I say “client” I am not sure if it’s the sender or receiver – I didn’t read much into this). In a LAN environment it could be the case that the WAN side is saturated but the particular client you are targeting is idle – in this case BITS will use the full client bandwidth available to it even though the network itself doesn’t have any spare bandwidth (this was the case prior to BITS 2.0 but since then BITS can use an Internet Gateway Device to try and assess the bandwidth availability on the WAN side – this requires an IGD be find-able via UPnP and also that the Internet Gateway Device support such reporting, so I am not sure how well it works in practice). You can also use GPOs to control BITS bandwidth usage.

Anyhoo, this is not a BITS intro so I’ll leave it at that. :)

Once you start a BITS transfer you can also pause it via Suspend-BitsTransfer or cancel it via Remove-BitsTransfer. The latter will also delete any files that are already transferred, so if you just want to cancel but leave the transferred files as it is use Complete-BitsTransfer instead.

That’s all for now!

p.s. Almost forgot. You can also use BITS to download HTTP files. Like in this post for instance.

vMotion NIC load balancing fails even though there is an active link

The other day I blogged about how I had a host whose vMotion VMkernel interface seemed to be broken. Any vMotion attempts to it would hang at 14%.

At that time I logged on to the destination host, then used vmkping with the -I switch (to explicitly specify the vMotion VMkernel interface of the destination host), and found that I couldn’t ping the VMkernel interface of the other hosts. These hosts could ping each other but couldn’t ping the destination host.

The VMKernel interface is backed by two physical NICs. I found that if I remove one of the physical NICs from the VMkernel it works. Interestingly this link wasn’t showing any CDP info either, so it looked like something was wrong with it (the physical NIC shows as unclaimed coz the screenshot was taken after I moved it to unclaimed).

Missing CDP infoSo the first question is why did the VMkernel fail when only one of the physical NICs failed? Since the other physical NIC backing the VMkernel NIC is still active shouldn’t it have continued working?

The reason why it failed is that by default network failover detection is via “Link status only”. This only detects failures to the link – like say the cable is broken, the switch is down, or the NIC has failed – while failures such as the link being connected but blocked by switch are not detected. In my case as you can see from the screenshot above the link status is connected – so the host doesn’t consider the link failed even though it isn’t actually working, thus continues to use it.

Next I discovered that other hosts too similarly had their second vMotion physical NIC in a failed state as above yet they weren’t failing like this host. The simple explanation for this is that the host above somehow selected the faulty physical NIC as the one to use, didn’t detect it as failed and so continued to use it; whereas other hosts were more lucky and chose the physical NIC that works alright, so didn’t have any issues.

I am not sure that’s the entire answer though. For once the host that failed was ESXi 5.5 and using a distributed switch, while the other two hosts were ESXi 4.0 and using standard switches. Did that make a difference?

The default load balancing method for both standard and distributed switches is the same. (For a standard switch you check this under the vSwitch properties on the host. For a distributed switch you check this under the portgroup in the Networking section of vSphere (web) client).

default load balancingLoad balancing is what I am concerned about here because that’s what the hosts should be using to balance between both the NICs. That’s what the host will be using to select the physical NIC to use for that particular traffic flow. The load balancing method is same between standard and distributed switches yet why were the distributed switch/ ESXi 5.5 hosts behaving differently?

I am still not sure of an answer but I have my theory. My theory is that since a distributed switch is across multiple hosts the load balancing method (above) of choosing a route based on virtual port ID comes into play. Here’s screenshots from two of my hosts connected to the same distributed switch port group for instance:

port numberAs you can see the virtual port number is different for the VMkernel NIC of each host. So each host could potentially use a different underlying physical NIC depending on how the load balancing algorithm maps it.

But what about a standard switch? Since the standard switch is only on the host, and the only VMkernel NIC connected to it (in the case of vMotion) is the single VMKernel NIC I have assigned for vMotion, there is no load balancing algorithm coming into play! If, instead of a VMkernel I had a Virtual Machine network, then the virtual port number matters because there are multiple VMs connecting to the various port numbers; but that doesn’t matter for VMkernel NICs as there is only one of them. And so my theory is that for a VMkernel NIC (such as vMotion) backed by multiple physical NICs and using the default load balancing algorithm of virtual port ID – all traffic by default goes into one of the physical NICs and the other physical NIC is never used unless the chosen one fails. And that is why my hosts using the standard switches were always using the same physical NIC (am guessing the lower numbered one as that’s what both hosts chose) while hosts using distributed switches would have chosen different physical NICs per host.

That’s all! Just thought I’d put this out there in case anyone else has the same question.

AppV – Empty package map for package content root

Had an interesting problem at work yesterday about which I wish I could write a long and interesting blog post, but truthfully it was such a simple thing once I identified the cause.

We use AppV for streaming applications. We have many branch offices so there’s a DFS share which points to targets in each office. AppV installations in each office point to this DFS share and thanks to the magic of DFS referrals correctly pick up the local Content folder. From day-before, however, one of our offices started getting errors with AppV apps (same as in this post), and when I checked the AppV server I found errors similar to this in the Event Logs:

The DFS share seemed to be working OK. I could open it via File Explorer and its contents seemed correct. I checked the number of files and the size of the share and they matched across offices. If I pointed the DFS share to use a different target (open the share in File Explorer, right click, Properties, go to the DFS tab and select a different location target) AppV works. So the problem definitely looked like something to do with the local target, but what was wrong?

I tried forcing a replication. And checked permissions and used tools like dfsrdiag to confirm things were alright. No issues anywhere. Restarting the DFS Replication service on the server threw up some errors in the Event Logs about some AD objects, so I spent some time chasing up that tree (looks like older replication groups that were still hanging around in AD with missing info but not present in the DFS Management console any more) until I realized all the replication servers were throwing similar errors. Moreover, adding a test folder to the source DFS share correctly resulted it in appearing on the local target immediately – so obviously replication was working correctly.

I also used robocopy to compare the the local target and another one and saw that they were identical.

Bummer. Looked like a dead end and I left it for a while.

Later, while sitting through a boring conference call I had a brainwave that maybe the AppV service runs in a different user context and that may not be seeing the DFS share? As in, maybe the error message above is literally what is happening. AppV is really seeing an empty content root and it’s not a case of a corrupt content root or just some missing files?

So I checked the AppV service and saw that it runs as NT AUTHORITY\NETWORK SERVICE. Ah ha! That means it authenticates with the remote server with the machine account of the server AppV is running on. I thought I’d verify what happens by launching File Explorer or a Command Prompt as NT AUTHORITY\NETWORK SERVICE but this was a Server 2003 and apparently there’s no straightforward way to do that. (You can use psexec to launch something as .\LOCALSYSTEM and starting from Server 2008 you can create a scheduled task that runs as NT AUTHORITY\NETWORK SERVICE and launch that to get what you want but I couldn’t use that here; also, I think you need to first run as the .\LOCALSYSTEM account and then run as the NT AUTHORITY\NETWORK SERVICE account). So I checked the Audit logs of the server hosting the DFS target and sure enough found errors that the machine account of the AppV server was indeed being denied login:

Awesome! Now we are getting somewhere.

I fired up the Local Security Policy console on the server hosting the DFS target (it’s under the Administrative Tools folder, or just type secpol.msc). Then went down to “Local Policies” > “User Rights Assignment” > “Access this computer from the Network”:

secpolSure enough this was limited to a set of computers which didn’t include the AppV server. When I compared this with our DFS servers I saw that they were still on the default values (which includes “Everyone” as in the screenshot above) and that’s why those targets worked.

To dig further I used gpresult and compared the GPOs that affected the above policy between both servers. The server that was affected had this policy modified via  GPO while the server that wasn’t affected showed the GPO as inaccessible. Both servers were in the same OU but upon examining the GPO I saw that it was limited to a certain group only. Nice! And when I checked that group our problem server was a member of it while the rest weren’t! :)

Turns out the server was added to the group by error two days ago. Removed the server from this group, waited a while for the change across the domain, did a gpupdate on the server, and tada! now the AppV server is able to access the DFS share on this local target again. Yay!

Moral of the story: if one of your services is unable to access a shared folder, check what user account the service runs as.

vMotion is using the Management Network (and failing)

Was migrating one of our offices to a new IP scheme the other day and vMotion started failing. I had a good idea what the problem could be (coz I encountered something similar a few days ago in another context) so here’s a blog post detailing what I did.

For simplicity let’s say the hosts have two VMkernel NICs – vmk0 and vmk1. vmk0 is connected to the Management Network. vmk1 is for vMotion. Both are on separate VLANs.

When our Network admins gave out the new IPs they gave IPs from the same range for both functions. That is, for example, vmk0 had an IP 10.20.1.2/24 (and 10.20.1.3/24 and 10.20.4/24 on the other hosts) and vmk1 had an IP of 10.20.12/24 (and 10.20.1.13/24 and 10.20.1.14/24 on the other hosts).

Since both interfaces are on separate VLANs (basically separate LANs) the above setup won’t work. That’s because as far as the hosts are concerned both interfaces are on the same network yet physically they are on separate networks. Here’s the routing table on the hosts:

Notice that any traffic to the 10.20.1.0/24 network goes via vmk0. And that includes the vMotion traffic because that too is in the same network! And since the network that vmk0 is on is physically a separate network (because it is a VLAN) this traffic will never reach the vMotion interfaces of the other hosts because they don’t know of it.

So even though you have specific vmk1 as your vMotion traffic NIC, it never gets used because of the default routes.

If you could force the outgoing traffic to specifically use vmk1 it will work. Below are the results of vmkping using the default route vs explicitly using vmk1:

The solution here is to either remove the VLANs and continue with the existing IP scheme, or to keep using VLANs but assign a different IP network for the vMotion interfaces.

Update: Came across the following from this blog post while searching for something else:

If the management network (actually the first VMkernel NIC) and the vMotion network share the same subnet (same IP-range) vMotion sends traffic across the network attached to first VMkernel NIC. It does not matter if you create a vMotion network on a different standard switch or distributed switch or assign different NICs to it, vMotion will default to the first VMkernel NIC if same IP-range/subnet is detected.

Please be aware that this behavior is only applicable to traffic that is sent by the source host. The destination host receives incoming vMotion traffic on the vMotion network!

That answered another question I had but didn’t blog about in my post above. You see, my network admins had also set the iSCSI networks to be in the same subnet as the management network – but separate VLANs – yet the iSCSI traffic was correctly flowing over that VLAN instead of defaulting to the management VMkernel NIC. Now I understand why! It’s only vMotion that defaults to the first VMkernel NIC in the same IP range/ subnet as vMotion. 

 

Find Outlook rules that are deleting a message

As part of troubleshooting something I needed to quickly find what Outlook rules the user had for deleting messages. So I came up with this one-liner.

The result is a list of rule names and a friendly description of what the rule does.

Run this from the EMS of course.