Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Recent Posts

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

HP DL360 Gen9 with HP FlexFabric 534 adapter and HP Ethernet 530 adapter and ESXi

That’s a very vague subject line, I know, but I couldn’t think of anything concise. Just wanted to put some keywords so that if anyone else comes across the same problem and types something similar into Google hopefully they stumble upon this post.

At work we got some HP DL360 Gen9s to use as ESXi hosts. To these servers we added additional network cards –

  • HP FlexFabric 10Gb 2-port 534FLR-SFP+ Adapter; and
  • HP Ethernet 10Gb 2-port 530SFP+ Adapter.

Each of these adapters have two NICs each. Here’s a picture of the adapters in the server and the vmnic numbers ESXi assigns to them.

serverIn this picture –

  • vmnic5 & vmnic4 are the HP FlexFabric 10Gb 2-port 534FLR-SFP+ Adapter;
  • vmnic6 & vmnic7 are the HP Ethernet 10Gb 2-port 530SFP+ Adapter; and
  • vmnic0 – vmnic3 are HP Ethernet 1Gb 4-port 331i Adapter (which come in-built into the server);
  • iLO is the iLO port (which I’ll ignore for now).

We didn’t want to use vmnic0 – vmnic3 as they are only 1Gb. So the idea was the use vmnic4 – vmnic7. Two NICs would be for Management+vMotion (connecting to two different switches); two NICs would be for iSCSI (again connecting to different switches).

We came across two issues. First was that the FlexFabric NICs didn’t seem to support iSCSI. ESXi showed two iSCSI adapters but the NICs mapped to them were the regular Ethernet 10Gb ones, not the FlexFabric 10Gb ones. Second issue was that we wanted to use vmnic4 and vmnic6 for Management+vMotion and vmnic5 and vmnic7 for iSCSI – basically a NIC from each adapter such that even if an adapter were to fail there’s a NIC from another adapter for resiliency. This didn’t work for some reason. The Ethernet 10Gb NICs weren’t “connecting” to the network switch for some reason. They would connect in the sense that the link status appears as connected and the LEDs on the switch and NICs blink, but something was missing. There was no real connectivity.

Here’s what we did to fix these.

But first, for both these fixes you have to reboot the server and go into the System Utilities menu.

f9 system utils

Change 1: Enable iSCSI on the FlexFabric adapter (vmnic4 and vmnic5)

Once in the System Utilities menu select “System Configuration”.

system configurationSelect the first FlexFabric NIC (port1).

select flexfabricThen select the Device Hardware Configuration menu.

select device hardwareYou will see that the storage personality is FCoE.

current flex personalityThat’s the problem. This is why the FlexFabric adapters don’t show up as iSCSI adapters. Select the FCoE entry and change it to iSCSI.

new flex personalityNow press Esc to go back to the previous menus (you will be prompted to save the changes – do so). Then repeat the above steps for the second FlexFabric NIC (port 2).

With this change the FlexFabric NICs will appear as iSCSI adapters. Now for the second change.

Change 2: Enable DCB for the Ethernet adapters

From the System Configuration menu now select the first Ethernet NIC (port 1).

select ethernetThen select its Device Hardware Configuration menu.

select device hardware (ethernet)Notice the entry for “DCB Protocol”. Most likely it is “Disabled” (which is why the NICs don’t work for you).

current DCBChange that to “Enabled” and now the NICs will work.

new DCBThat’s it. Once again press Esc (choosing to save the changes when prompted) and then reboot the system. Now all the NICs will work as expected and appear as iSCSI adapters too.

rebootI have no idea what DCB does. From what I can glean via Google it seems to be a set of extensions to Ethernet that provide “hardware-based bandwidth allocation to a specific type of traffic and enhances Ethernet transport reliability with the use of priority-based flow control” (via TechNet) (also check out this Cisco whitepaper for more info). I didn’t read much into it because I couldn’t find anything that mentioned why DCB mattered in this case – as in why were the NICs not working when DCB was disabled? The NICs are connected to an HP 5920AF switch but I couldn’t find anything that suggested the switch requires DCB enabled for the ports to work. This switch supports DCB but that doesn’t imply it requires DCB.

Anyhow, the FlexFabric adapters have DCB enabled by default which is probably why they worked. That’s how I got the idea to enable DCB on the Ethernet adapters to see if it makes a difference – and it did! The only thing I can think of is that DCB also seems to include a DCBX (Data Centre Bridging Exchange) protocol which is about discovering peers, discovering mismatched configuration etc – so maybe the fact that DCB was disabled on these adapters made the switch not “see” these NICs and soft-disable them somehow. That’s my guess at least.

Windows DNS server subnet prioritization and round-robin

Consider the following multiple A records for a DNS record proxy.mydomain.com:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

These records are defined on a DNS server. When a client queries the DNS server for the address to proxy.mydomain.com, the DNS server returns all the addresses above. However, the order of answers returned keeps varying. The first client asking for answers could get them in the following order for instance:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

The second client could get them in the following order:

  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5
  • proxy.mydomain.com IN A 192.168.10.5

The third client could get:

  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5
  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5

This is called round-robin. Basically it rotates between the various IP addresses. All IP addresses are offered as answers, but the order is rotated so that as long as clients choose the first answer in the list every client chooses a different IP address.

Notice I said clients choose the first answer in the list. This needn’t always be the case though. When I said clients above, I meant the client computer that is querying the DNS server for an answer. But that’s not really who’s querying the server. Instead, an application on the client (e.g. Chrome, Internet Explorer) or the client OS itself is the one looking for an answer. These ask the DNS resolver which is usually a part of the OS for an answer, and it’s the resolver that actually queries the server and gets the list of answers above.

The DNS resolver can then return the list as it is to the requesting application, or it can apply a re-ordering of its own. For instance, if the client is from the 192.168.10.0 network, the resolver may re-order the answers such that the 192.168.10.5 answer is always first. This is called Subnet prioritization. Basically, the resolver prioritizes answers that are from the same subnet as the client. The idea being that client applications would prefer reaching out to a server in their same subnet (it’s closer to them, no need to go over the WAN link for instance) than one on a different subnet.

Subnet prioritization can be disabled on the resolver side by adding a registry key PrioritizeRecordData (link) with value 0 (REG_DWORD) at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DnsCache\Parameters. By default this key does not exist and has a default value of 1 (subnet prioritization enabled).

Subnet prioritization can also be set on the server side so it orders the responses based on the client network. This is controlled by the registry key LocalNetPriority (link) under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters\ on the DNS server. By default this is 0, so the server doesn’t do any subnet prioritization. Change this to 1 and the server will order its responses according to the client subnet.

By default the server also does round-robin for the results it returns. This can be turned off via the DNS Management tool (under server properties > advanced tab). If round-robin is off the server returns records in the order they were added.

More on subnet prioritization at this link.

That’s is not the end though. :)

Consider a server who has round-robin and subnet prioritization enabled. Now consider the DNS records from above:

  • proxy.mydomain.com IN A 192.168.10.5
  • proxy.mydomain.com IN A 10.136.53.5
  • proxy.mydomain.com IN A 10.136.52.5
  • proxy.mydomain.com IN A 10.136.33.5
  • proxy.mydomain.com IN A 192.168.15.5

The first and last records are from class C networks. The other three are from Class A networks. In reality though thanks to CIDR these are all class C addresses.

Now say there’s a client with IP address 10.136.50.2/24 asking the server for answers. On the face of it the client network does not match any of the answer record networks so the server will simply return answers as per round-robin, without any re-ordering. But in reality though the client 10.136.50.2/24 is in the same network as 10.136.52.5/24 and both are part of a larger 10.136.48.0/20 network that’s simply been broken into multiple /24 networks (to denote clients, servers, etc). What can we do so the server correctly identifies the proxy record for this client?

This is where the LocalNetPriorityNetMask registry key under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters\ on the DNS server comes into play. This key – which does not exist by default – tells the server what subnet mask to assume when it’s trying to subnet prioritize. By default the server assumes a /24 subnet, but by tweaking this key we can tell the server to use a different subnet in its calculations and thus correctly return an answer.

The LocalNetPriorityNetMask key takes a REG_DWORD value in a hex format. Check out this KB article for more info, but a quick run through:

A netmask can be written as xxx.xxx.xxx.xxx. 4 pairs of numbers. The LocalNetPriorityNetMask key is of format 0xaabbccdd – again, 4 pairs of hex numbers. This is a mask that’s applied on the mask of 255.255.255.255 so to calculate this number you subtract the mask you want from 255.255.255.255 and convert the resulting numbers into hex.

For example: you want a /8 netmask. That is 255.0.0.0. Subtracting this from 255.255.255.255 leaves you with 0.255.255.255.255. What’s that in hex? 00ffffff. So LocalNetPriorityNetMask will be 0x00ffffff. Easy?

So in the example above I want a /20 netmask. That is, I am telling the server to assume the clients and the record IPs it has to be in a /20 network, so subnet prioritize accordingly. A /20 netmask is 255.255.240.0. Subtract from 255.255.255.255 to get 0.0.15.255. Which in hex is 00000fff (15 decimal is F hex). So all I have to do is put this value as LocalNetPriorityNetMask on the DNS server, restart the service, and now the server will correctly return subnet prioritized answers for my /20 network.

Update: Some more links as I did some more reading on this topic later.

  • Ace Fekay’s post – a must read!
  • A subnet calculator (also gives you the wildcard, which you can use for calculating the LocalNetPriorityNetMask key)
  • I am not very clear on what happens if you disable RoundRobin but there are multiple entries from the same subnet. What order are they returned in? Here’s a link to the RoundRobin setting, doesn’t explain much but just linking it in case it helps in the future.
  • More as a note to myself (and any others if they wonder the same) – the subnet mask you specify is applied on the client. That is to say if you client has an IP address of say 10.136.20.10, by default the DNS server will assume a subnet mask of /24 (Class C is the default) and assume the client is in a 10.136.20.0/24 network. So any records from that range are prioritized. If you want to include other records, you specify a larger subnet mask. Thus, for example, if you specify a /20 then the client is assumed to have an IP address 10.136.20.10/20, so its network range is considered to be 10.136.16.1 – 10.136.31.254 (don’t wrack your brain – use the subnet calculator for this). So any record in this range is prioritized over records not in this range.
  • The Windows calculator can be used to find the LocalNetPriorityNetMask key value. Say you want a subnet mask of /19. The subnet calculator will tell you this has a wildcard of 0.0.31.255 – i.e. 00011111.11111111. Put this (13 1’s) into the Windows calculator to get the hex value 3FFF.  

It is possible to vMotion VMs across ESX hosts without shared storage

Today (well actually, a few days ago; but today is when I read more about it) I learnt that you can vMotion VMs across hosts without shared storage.

This is only for vSphere 5.1 and above. That’s a pretty cool feature, especially because at work we are migrating all our VMs to new hosts & storage and one of things we were wondering about was how to move the VMs across. The new hosts have 3Par storage while the old hosts have StoreVirtual storage, so the thinking was that we’d probably have to give the new hosts access to the StoreVirtual storage and then do a vMotion. Now we won’t have to!

There’s no separate name for this sort of vMotion and it seems to be a not quite hyped feature. For anyone interested here’s some screenshots on how to do such a vMotion.

For starters here’s my testlab setup:

setupOne datacenter. Two clusters. Cluster one has two hosts with shared storage. Cluster two has a single host with no shared storage. UBUNTU1 is a VM I would like to migrate over.

Note that host esx03 has no connectivity to the shared storage either. I have removed the iSCSI VMkernel mappings from it so there’s no confusion.

esx03 shared storageESX01 and ESX02 have access to shared storage.

esx01 shared storageMigration is quite simple. Right click the VM and select Migrate. Choose the option to migrate both host and datastore. If the VM is powered on (which it would be as we are doing vMotion instead of a cold migration) you will see the option is grayed out in the older/ C# vSphere client.

migrate host and datastore - 1That’s because the newer features of vSphere 5.1 are only available in the web client so you’ll have to use that instead (thanks to this blog post for pointing me to that).

migrate host and datastore - 2Select the destination host. Note that vMotion is only between datacenters so you can only chose a host in the same datacenter (as opposed to cold migration which can happen between datacenters).

select destination

Select Datacenter

select destination host

Select Host

Select Datastore

Select Datastore

Notice that any datastore accessible from the destination host can be selected.

And that’s it. vMotion begins and I have easily live migrated a VM from one host to another without any shared storage. Cool! :)

setup2

Get a list of users in an OU along with last logged on date

Trivial stuff. Wanted to note it down someplace for future reference –