Contact

Subscribe via Email

Subscribe via RSS

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Unable to ping Nested VMs (XenServer/ VMware ESXi)

Spent the better part of two days chasing an issue only to find it was no issue at all. So irritated! Wasn’t a total waste of time as I got to read stuff, but it side tracked me from the main issue.

Here’s my setup. I have a Windows Server 2012R2 physical server. This runs VMware Workstation 12.5. Within it I have XenServer and VMware ESXi (the hypervisor isn’t relevant to the story but I mention it anyways). Within the hypervisor I have a Windows 8.1 VM – well two of them actually, but again it doesn’t matter much to the story.

Within VMware Workstation I have a couple of other servers too – a mix of Windows Server 2012 R2, 2016, and FreeBSD.

Let’s call the VMs within VMware Workstation as “VMs” while the VMs within the nested hypervisors as “VVMs”. The issue was that from the VVMs I was able to ping the VMs and get info from them (e.g. IP addresses) but I couldn’t ping the VVMs from the VMs. It didn’t matter which hypervisor the VVM was on. Also, the VVMs couldn’t ping each other.

There’s a lot of forum and blog posts on theis topic but their issue seems to be different. Their issue is that the VVMs are unable to see the outside world (i.e. the VMs). But my issue was that the VVMs could see the outside world; it was the outside world that couldn’t see them. All the forum and blog posts pointed to it being a case of the virtual switch not allowing promiscuous mode or forged MAC addresses, and the fix was to enable these. In my case I couldn’t find any such setting on VMware Workstation so I began suspecting it as the culprit.

Some good links I found while reading on these; putting them here as info for myself:

Oh, and if you are on a Linux host (where VMware Workstation is running) then you need to do some extra stuff to enable Promiscuous mode.

Nowhere could I find anything on what to do for VMware Workstation running on Windows and whether it had promiscuous mode enabled or not.

Finally I resorted to using tcpdump (on XenServer)/ tcpdump-uw (on ESXi) to see if the nested hypervisor is receiving the ICMP packets – it was. The ARP requests had the correct MAC addresses too. Next I installed Wireshark on a VM and VVM to see what was happening, and I could see that the VVM was receiving packets but not replying. So the switch in VMware Workstation was definitely in promiscous mode – the problem was in the VVM. I didn’t suspect a VVM firewall at all as I had disabled the Windows firewall service; but just for the heck of it I enabled the firewall service and simply turned off the firewall. And what do you know – suddenly the VVM is responding to ICMP packets!!

I have no idea why this is so. I had always thought disabling the firewall service is enough to … well, disable the firewall. But looks like actually disabling the firewall for each of the network profiles is the important thing. Weird.

Anyways – after two days of scratching my head I now have connectivity from my VMs to VVMs.

Creating an AD certificate for NetScaler 10.5

This post is based on a post by someone else that I found while I had to do this today. I wanted to configure NetScaler 10.5 with Citrix Storefront 3.9 and found that post useful, but some of the screenshots were different in my case – so thought I’d write it down for my future self. This post is going to be less on writing and more of screenshots as I am feeling very lazy.

So without much further ado –

Login to the NetScaler and create an RSA Key

1-2-3 as below.

Fill in the following fields and click “Create”.

The file name and extension doesn’t matter but we will refer to it later.

Create a Certificate Signing Request (CSR) on the NetScaler

Again, the request file name does not matter. The key filename & password is same as what we used earlier. There’s few more fields to fill – obvious ones like the organization name etc, the mandatory ones have an asterisk – then click “Create”.

Open the CSR

Click the link to view. Then click the link to “save text to a file”.

Login to your AD Certification Authority and submit the request

I am going to use the command line as the CSR doesn’t contain info on what template the CA should use, and that gives an error on the GUI: “0x80094801 – the request contains no certificate template information”.

Using the command line is simple. Open the command prompt and type the following:

This will prompt you for the location of the CSR and also the CA to use etc.

If you get any error about missing templates here, it’s possible you haven’t added the “Web Server” template to your CA templates. You can via this menu –

The command will also prompt for a location to save the generated certificate at. Save it someplace, then go back to the NetScaler.

Login to the NetScaler and install this certificate

Click the Install button as above. Then fill in the details as below. The certificate-key pair name does not matter. The certificate file name is chosen by clicking on “Browse”, then “Local”, and selecting the certificate file that you previously saved. The key file name and password are same as what you typed in the initial screenshot.

Finally, click “Install”.

That’s it! The NetScaler now has a certificate issued by the AD CA.

Reboot a bunch of ESXi hosts one after the other

Not a big deal, I know, but I felt like posting this. :)

Our HP Gen8 ESXi hosts were randomly crashing ever since we applied the latest ESXi 5.5 updates to them in December. Logged a call with HP and turns out until a proper fix is issued by VMware/ HPE we need to change a setting on all our hosts and reboot them. I didn’t want to do it manually, so I used PowerCLI to do it en masse.

Here’s the script I wrote to target Gen8 hosts and make the change:

I could have done the reboot along with this, but I didn’t want to. Instead I copy pasted the list of affected hosts into a text file (called ESXReboot.txt in the script below) and wrote another script to put them into maintenance mode and reboot one by one.

The screenshot output is slightly different from what you would get from the script as I modified it a bit since taking the screenshot. Functionality-wise there’s no change.

Windows Services – Fix unquoted path vulnerabilities using PowerShell

At work as part of some security certification we are running Nessus scans on all our systems and it came up with the following vulnerability – link. Read that link, it’s good info.

Basically if one of your Windows Service entries point to (say) “C:\Windows\Microsoft.NET\Framework64\v3.0\Windows Communication Foundation\SMSvcHost.exe” without the double quotes then one could potentially create a malicious file called Windows.exe at “C:\Windows\Microsoft.NET\Framework64\v3.0\Windows” and Windows will execute that file instead of parsing the full path and treating it as part of a folder name. That’s because Windows uses space as a delimiter between a command and its switches & arguments and so it could treat the entry as “C:\Windows\Microsoft.NET\Framework64\v3.0\Windows.exe” with arguments “Communication Foundation\SMSvcHost.exe“.

The solution for this is to find all such entries that contain a space, and if the path is not in double quotes then make it so. You have to do this in the registry, so you could either do it manually or make a script and do it en masse. I went the latter route so here’s something I created.

I am being lazy and not really offering input etc as I just expect a list of servers to be scanned in a file called “ServerNames.txt”. I have the above saved to a .ps1 file and I simply run it as .\Registry.ps1. Feel free to adapt to your needs.

What it does is that it connects to the server specified (provided they are online), tries to open the “services” key under HKLM (assuming it has access), and then enumerates all the subkeys that contain the service names and checks if the path has a space. It only matches against paths containing a .exe so it could miss out some stuff. Once it finds a match it extracts the bit up to the .exe, splits it along any spaces, and if there are more than one results (which means the path did have spaces) it encloses it in double quotes and replaces the original entry.

The code is smart enough to know it must correctly double quote something like "D:\Program Files (x86)\BigHand\BigHand Workflow Server 4.6\BHServer.exe /V:4.6" as "D:\Program Files (x86)\BigHand\BigHand Workflow Server 4.6\BHServer.exe" /V:4.6 and not "D:\Program Files (x86)\BigHand\BigHand Workflow Server 4.6\BHServer.exe /V:4.6".

By default it only displays results. An optional parameter -FixIt will also make the changes.

Example output:

Hope it helps!

IE 11 update fails due to prerequisite updates (KB2729094)

IE 11 update requires the following prerequisite updates – link.

Even after installing those (most of which are already there) IE 11 install will complain and fail. The log files are in C:\Windows\IE_.main.log.

In my case I was getting the following error (seems to be the same for others too):

Thing is I already had this hotfix installed, so there was nothing more to do. Found this useful support post where someone suggested running the hotfix install and side-by-side launching the IE install. Might need to do it 2-3 times but that seems to make a difference. So I tried that and sure enough it helped.

That post is worth a read for some other tricks, especially if you are sequencing this via SCCM. I found this article from Symantec too which seems helpful. Some day when I am in charge of SCCM too I can try such stuff out! :)

VCSA migration – “A problem occurred while logging in. Verify the connection details.”

So, I was trying out a Windows vCenter 5.5 to VCSA 6.5 appliance migration and at the stage where I enter the target ESX host name where the appliance will be deployed to I got the above error.

Wasted the better part of my day troubleshooting this as I could find absolutely no mention of what was causing this. The installer log had the following but that didn’t shed much light either.

Tried stuff like 1) try a different ESX host, 2) update it to a later version (it was 5.5 Build 3568722), 3) turn on the ESX Shell and SSH in case that mattered – but nothing helped!

Nothing came up regarding the “vimService creation failed: Error” line either. But then I began Googling on “vimService” and learnt that it is the vSphere Management SDK and that you access the SDK via a URL like https://servername/sdk. That got me thinking whether the VCSA installer looks to the proxy settings of the machine where I am running it from, so I turned off the proxy settings in IE – and that helped!

Who would have thought. :)

Cisco CME outgoing caller ID not showing individual extensions

Been working with our Cisco CME (Cisco Unified Communications Manager – Express – as a reminder to myself!) at work past 2-3 days. I have no idea about Cisco telephony, but wanted to tackle an issue anyways. Good way to learn a new system.

The issue was that whenever anyone make an outgoing external call from our system the caller ID number shown at the remote end is that of our main number. That is to say, if our main number extension is 900, this main number externally appears as 1234900, and my extension is 929, when I make an outgoing external call to (say) my mobile the number appears as 1234900 instead of 1234929.

A useful command to debug such situations is the following command:

This shows what is set to the ISP when I place an outgoing external call.

Another useful command is:

This shows the internal processing that happens on CME/ CUCM. Things like what translations happen, what dial peers are selected, etc. It’s a lot of output compared to the first command.

To see what debugging is enabled on your system the following command is useful:

To turn off all debugging (coz it takes a toll on your router and so you must disable it once you are done):

Lastly, to see the debugging output if you are SSH’d into the router rather than on the console, do the following:

In my case I found out that even though CME was correctly sending the calling extension/ number as 9xx to the ISP, it looked like the ISP was ignoring it. I thought that maybe it expects the number in a proper format (as an external number) so I made a translation rule for outgoing calling numbers to change 9xx to the correct format (12349xx) and pass to ISP – and that fixed the issue.

Here’s the rule and translation profile I created. I played around a bit with the output and saw that I need to pass a “0” before the full number for the ISP to recognize it correctly.

I set the plan and type too though it didn’t make any difference in my case. I saw some posts on the Internet where it does seem to make a difference, so I didn’t remove it.

Lastly I apply this profile to the voice port:

That’s all.

P2V a SQL cluster by breaking the cluster

Need to P2V a SQL cluster at work. Here’s screenshots of what I did in a test environment to see if an idea of mine would work.

We have a 2 physical-nodes SQL cluster. The requirement was to convert this into a single virtual machine.

P2V-ing a single server is easy. Use VMware Converter. But P2V-ing a cluster like this is tricky. You could P2V each node and end up with a cluster of 2 virtual-nodes but that wasn’t what we wanted. We didn’t want to deal with RDMs and such for the cluster, so we wanted to get rid of the cluster itself. VMware can provide HA if anything happens to the single node.

My idea was to break the cluster and get one of the nodes of the cluster to assume the identity of the cluster. Have SQL running off that. Virtualize this single node. And since there’s no change as far as the outside world is concerned no one’s the wiser.

Found a blog post that pretty much does what I had in mind. Found one more which was useful but didn’t really pertain to my situation. Have a look at the latter post if your DTC is on the Quorum drive (wasn’t so in my case).

So here we go.

1) Make the node that I want to retain as the active node of the cluster (so it was all the disks and databases). Then shutdown SQL server.

sqlshutdown

2) Shutdown the cluster.

clustershutdown

3) Remove the node we want to retain, from the cluster.

We can’t remove/ evict the node via GUI as the cluster is offline. Nor can we remove the Failover Cluster feature from the node as it is still part of a cluster (even though the cluster is shutdown). So we need to do a bit or “surgery”. :)

Open PowerShell and do the following:

This simply clears any cluster related configuration from the node. It is meant to be used on evicted nodes.

Once that’s done remove the Failover Cluster feature and reboot the node. If you want to do this via PowerShell:

4) Bring online the previously shared disks.

Once the node is up and running, open Disk Management and mark as online the shared disks that were previously part of the cluster.

disksonline

5) Change the IP and name of this node to that of the cluster.

Straight-forward. Add CNAME entries in DNS if required. Also, you will have to remove the cluster computer object from AD first before renaming this node to that name.

6) Make some registry changes.

The SQL Server is still not running as it expects to be on a cluster. So make some registry changes.

First go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\Setup and open the entry called SQLCluster and change its value from 1 to 0.

Then take a backup (just in case; we don’t really need it) of the key called HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\Cluster and delete it.

Note that MSSQL10_50.MSSQLSERVER may vary depending on whether you have a different version of SQL than in my case.

7) Start the SQL services and change their startup type to Automatic.

I had 3 services.

Now your SQL server should be working.

8) Restart the server – not needed, but I did so anyways.

Test?

If you are doing this in a test environment (like I was) and don’t have any SQL applications to test with, do the following.

Right click the desktop on any computer (or the SQL server computer itself) and create a new text file. Then rename that to blah.udl. The name doesn’t matter as long as the extension is .udl. Double click on that to get a window like this:

udl

Now you can fill in the SQL server name and test it.

One thing to keep in mind (if you are not a SQL person – I am not). The Windows NT Integrated security is what you need to use if you want to authenticate against the server with an AD account. It is tempting to select the “Use a specific user name …” option and put in an AD username/ password there, but that won’t work. That option is for using SQL authentication.

If you want to use a different AD account you will have to do a run as of the tool.

Also, on a fresh install of SQL server SQL authentication is disabled by default. You can create SQL accounts but authentication will fail. To enable SQL authentication right click on the server in SQL Server Management Studio and go to Properties, then go to Security and enable SQL authentication.

sqlauth

That’s all!

Now one can P2V this node.

PowerShell – List of machines with CPU and disk info

Had to come up with a list of machines and their CPU, disk info etc. Thought it better to make a small script for it.

Hope it helps someone.

Removing Datastores from an ESX host

Datastores in ESX hosts are made up of extents. Extents can be thought of as the underlying physical disk/ LUN that goes into making up the datastore.

A datastore is usually made up of a single extent, but can span multiple extents too. So removing a datastore from an ESX hosts means you dismount the datastore and then detach the extents.

Datastores have friendly names that you assign when creating it. Extents have names that usually start with naa or eui.

In vSphere client when you select a host, go to its Configuration tab, Storage, select Datastores view – the “Identification” column shows the datastore name and the “Device” column shows the extent name.

In PowerCLI the same information can be seeing using  Get-View or the ExtensionData property object of a datastore object (as in my previous post).

Anyways, to remove a datastore from an ESX host you first go to the Datastores screen as above, select the datastore, right click and select “Unmount”. This will do a bunch of checks (such as whether any VMs running on that host have their disks on this datastore) and then let you unmount it. This only removes the datastore name from the ESX host though; the host can still see and mount the datastore. So the next step is to also detach the extent from the host – i.e. unpresent the underlying disk/ LUN from the host.

For this you need the extent names. Get these as above (by expanding the “Device” column to see the name; or use PowerCLI). Then go to the Devices view (instead of the Datastores view that you currently are on). Expand the “Identifier” column now and find the extents that we want to detach. Once you find this right click and select “Detach”. This too does some checks and then lets you detach the extent if it’s not in use.

That’s it.

p.s. Too lazy to take screenshots. Sorry about that. :)

Get a list of VMs running on specific datastores, along with the host

Needed to dismount some datastores/ LUNs from a few hosts but before doing that needed to ensure none of the VMs running on these datastores are hosted on the hosts I want to remove access from. This one-liner PowerCLI will do just that for you:

Replace “PP_” with the pattern you are interested in matching in the datastore name.

A variation of the above where I only list VMs that are hosted the hosts I want to remove access from:

In my case the hosts that should have access to the datastores with a “PP_” in their name will also have numbers 01-03 in them. Any VMs not on hosts with these names are what I am interested in.

Kindles – Voyage & Oasis

Recently I decided to upgrade my Kindle. And went on a splurge and first bought the Voyage, and then the Oasis (on a 5 month installment scheme from Amazon). This was a huge upgrade for me – device I hitherto used for reading being the first gen Paperwhite. 

The first gen Paperwhite was my first and only Kindle up to this point. When the subsequent generations were released I never upgraded. Mainly coz my reading habits were off and on, and also because I used to supplement the Paperwhite with the Kindle apps on my iPad and Nexus tablet. Neither of them were as good as the Paperwhite but like I said my reading was off and on, and I used to read other stuff like PDFs and Instapaper and Longform, plus for a long time I was into comics. 

Fast forward to the present I slowly stopped reading all those other mediums too and pretty much stopped any reading. I think after a long time I read  “A Slight Trick of the Mind” by Mitch Cullen mainly because I saw that excellent movie “Mr. Holmes” which is based on the book and was so in love with the movie and it’s background score. The book didn’t live up to either of them but I persisted and finished it nevertheless over a weekend. After this I think I read a few books on the Paperwhite – mostly non-fiction. 

A few months later I signed up on Audible to try it out, this with yet another Holmes book – that of the elder brother (a book called “Mycroft Holmes”). I didn’t enjoy this book much either but I bought the Kindle version to read side by side and also try out Whispersync. That was nice. The book wasn’t great but I enjoyed the ability to sync and read together etc. Anyways, I didn’t manage much on Audiblr either and was about to close it after the trial month but Amazon offered a 3 month extension at half the price and so I stuck on. Good that I did coz now I am hooked on to Audible. 

I guess it’s coz of Audible and a rekindling of my interest in reading/ prose, plus a nudge from Amazon in terms of a reduced price on the Voyage for Prime members that I bought the Voyage. This was a giant step forward from the aging first gen Paperwhite that I was hooked and started voraciously reading.  Then I wanted to try the Oasis too, and even though it is pricey and has many negative reviews regarding its screen (and I am not rich and don’t have cash to throw around) I decided to buy it. 

Both are delightful devices. My favorite features would be the single pane of glass (without the depression on the screen as with the Paperwhite) – not sure why that matters, but it feels good – plus the ability to turn pages via Pagepress or the physical buttons. I especially love the latter. Makes it so convenient reading single handedly. 

I like both devices. I think I prefer the Voyage slightly more coz it feels more polished; but the Oasis has a lot more “cute” or children’s book sort of feel to it. It’s a nice little device. Sort of short and squarish. And more handy reading in a dark room as the page turn is via physical buttons as opposed to pressing the bezel on the Voyage (which is a hit and miss in the dark). Plus I love the cover and I feel it a lot easier to hold in hand. That’s not fair to the Voyage though as my comparison doesn’t include the Voyage case (which I don’t have). 

Initially I thought my Oasis had lighting issues as I felt one side is a bit darker than the other. I still feel so but when reading in the dark it doesn’t feel so, so maybe it’s just the external lighting. The Voyage consistently feels better in terms of lighting though. And maybe I am wrong but the text on the Voyage seems slightly more sharper – but that’s probably just me nitpicking. 

Anyhow. For anyone sitting on the fence these are excellent devices and a worthy upgrade over the Paperwhite (which is a good device too – what I mean is that you are getting some value for the extra cash you dole out for the Voyage or Oasis). 

Installing a new license key in KMS

KMS is something you login to once in a blue moon and then you wonder how the heck are you supposed to install a license key and verify that it got added correctly. So as a reminder to myself.

To install a license key:

Then activate it:

If you want to check that it was added correctly:

I use cscript so that the output comes in the command prompt itself and I can scroll up and down (or put into a text file) as opposed to a GUI window which I can’t navigate.

PowerCLI, VMware Tools update, etc.

(The following is based on this VMware KB article which is for ESXi 4.0 and earlier but can be made to work for later versions too).

In vSphere client we can see the VMware Tools related settings of a VM in the Options tab of the VM properties window. In PowerCLI these are exposed under the ExtensionData object. Specifically the ExtensionData.Config.Tools object.

The ExtensionData object has many methods and properties – think of it like the advanced options menu in a GUI. One of these methods is ReconfigVM() which takes an object of type VMware.Vim.VirtualMachineConfigSpec and reconfigures the VM accordingly.

So to take the example of modifying the VMware Tools update settings all one has to do is create a new object of the type above and pass it to the ReconfigVM() method. Something as below.

First we create an object of this type:

If we look at this object now we will see that it has various properties and methods. The Tools related settings are controlled by a property called Tools of type VMware.Vim.ToolsConfigInfo. To modify these we need to create a new object of that type:

This has no settings by default:

But we can set the properties we are interested in modifying.

For instance to set VMware Tools to be automatically updated upon power cycle do the following:

To undo that change set the value to “manual” (it only takes two options).

Here’s an example of me changing the VMware Tools updating settings to be manual.

So that’s it. Now to do this en-masse for a bunch of VMs you can make a loop.

If the list of VMs is got from vCenter directly (via say something like Get-VM | where {(Get-Cluster).Name -eq “CLUSTER NAME”}) then the code needs a bit of change (the $VMObj line can be removed).

Just as a reference to future me, the output returned by the ExtensionData object is what you would get via the Get-View cmdlet.

Update: Came across this while writing this post. If you have multiple vCenter servers and want PowerCLI to work against entities in all of them the following will help.

Enabling SNMPv3 on ESXi hosts

A continuation to my earlier post which was to do with SNMPv2.

As before, connect to the vCenter via PowerCLI. And as before the set() method can be used to set SNMP – both v2 and/or v3. The definition of this method is as follows:

That’s confusing so best to copy paste the definition into notepad or something so you can be sure you are passing the correct arguments.

First things first. There doesn’t seem to be a way of turning off something. As in, say you already have SNMPv2 turned on, you can’t turn it off by setting the community strings to blank. Doing so generates an error. So if you want to turn previous things off it’s best to do a reset and start with a clean slate.

This sets things back to their defaults:

Before going ahead with any SNMPv3 configuration we need to decide on what authentication and privacy protocols to use. In my case I want to use SHA1 and AES-128. So I need to set that first:

Once I have done this I can generate the hashes. I will need this later to configure SNMPv3.

In the example above both my passwords are Password1.

With this in hand I configure SNMPv3:

That’s it really. In the above example I will be using an SNMPv3 user called snmpUser1.

Now to do it across my estate I can make a loop. No need to create password hashes for each host. The hash stays the same as long as you are using the same password for each host.

That’s all!