Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

TIL: Transparent Page Sharing (TPS) between VMs is disabled by default

(TIL is short for “Today I Learned” by the way).

I always thought an ESXi host did some page sharing of VM memory between the VMs running on it. The feature is called Transparent Page Sharing (TPS) and it was something I remember from my VMware course and also read in blog posts such as this and this. The idea is that if you have (say) a bunch of Server 2012R2 VMs running on a host, it’s quite likely these VMs have quite a bit of common stuff between them in RAM, so it makes sense for the ESXi host to share that common stuff between the hosts. So even if each VM has (say) 4 GB RAM assigned to it, and there’s about 2GB worth of stuff common between the VMs, the host only needs to use 2GB shared RAM + 2 x 2GB private RAM for a total of 6GB RAM. 

Apart from this as the host is under increased memory pressure it resorts to techniques like ballooning and memory swapping to free up some RAM for itself. 

I even made a script today to list out all the VMs in our environment that have greater than 8GB RAM assigned to them and are powered on and to list the amount of shared RAM (just for my own info). 

Anyhow – around 2015 VMware stopped page sharing of VM memory between VMs. VMware calls this sort of RAM sharing as inter-VM TPS. Apparently this is a security risk and VMware likes to ship their products as secure by default, so via some patches to the 5.x series (and as default in the 6.x series) it turned off inter-VM TPS and introduced some controls that allow IT Admins to turn this on if they so wish. Intra-VM TPS is still enabled – i.e. the ESXi host will do page sharing within each VM – but it not longer does page sharing between VMs by default. 

Using the newly introduced controls, however, it is possible to enable inter-VM TPS for all VMs, or selectively between some VMs. Quoting from this blog post

You can set a virtual machine’s advanced parameter sched.mem.pshare.salt to control its ability to participate in transparent page sharing.  

TPS is only allowed within a virtual machine (intra-VM TPS) by default, because the ESXi host configuration option Mem.ShareForceSalting is set to 2, the sched.mem.pshare.salt is not present in the virtual machine configuration file, and thus the virtual machine salt value is set to unique value. In this case, to allow TPS among a specific set of virtual machines, set the sched.mem.pshare.salt of each virtual machine in the set to an identical value.  

Alternatively, to enable TPS among all virtual machines (inter-VM TPS), you can set Mem.ShareForceSalting to 0, which causes sched.mem.pshare.salt to be ignored and to have no impact.

Or, to enable inter-VM TPS as the default, but yet allow the use of sched.mem.pshare.salt to control the effect of TPS per virtual machine, set the value of Mem.ShareForceSalting to 1. In this case, change the value of sched.mem.pshare.salt per virtual machine to prevent it from sharing with all virtual machines and restrict it to sharing with those that have an identical setting.

Nice! 

I wonder if intra-VM TPS has much memory savings. Looking at the output from my script for our estate I see that many of our server VMs have about half their allocated RAM as shared, so it does make an impact. I guess it will also make a difference when moving to a container architecture wherein a single VM might have many containers. 

I would also like to point out to this blog post and another blog post I came across from it on whether inter-VM TPS even makes much sense in today’s environments and also on the kind of impact it can have during vMotion etc. Good stuff. I am still reading these but wanted to link to them for reference. Mainly – nowadays we have larger page sizes and so the probability of finding an identical page to be shared between two VMs is low; then there is NUMA that places memory pages closer to the CPU and TPS could disrupt that; and also, TPS is a process that runs periodically to compare pages, so there is an operational cost as it runs and finds a match and then does a full compare of the two pages to ensure they are really identical. 

Good to know. 

Reboot a bunch of ESXi hosts one after the other

Not a big deal, I know, but I felt like posting this. :)

Our HP Gen8 ESXi hosts were randomly crashing ever since we applied the latest ESXi 5.5 updates to them in December. Logged a call with HP and turns out until a proper fix is issued by VMware/ HPE we need to change a setting on all our hosts and reboot them. I didn’t want to do it manually, so I used PowerCLI to do it en masse.

Here’s the script I wrote to target Gen8 hosts and make the change:

I could have done the reboot along with this, but I didn’t want to. Instead I copy pasted the list of affected hosts into a text file (called ESXReboot.txt in the script below) and wrote another script to put them into maintenance mode and reboot one by one.

The screenshot output is slightly different from what you would get from the script as I modified it a bit since taking the screenshot. Functionality-wise there’s no change.

Get a list of VMs running on specific datastores, along with the host

Needed to dismount some datastores/ LUNs from a few hosts but before doing that needed to ensure none of the VMs running on these datastores are hosted on the hosts I want to remove access from. This one-liner PowerCLI will do just that for you:

Replace “PP_” with the pattern you are interested in matching in the datastore name.

A variation of the above where I only list VMs that are hosted the hosts I want to remove access from:

In my case the hosts that should have access to the datastores with a “PP_” in their name will also have numbers 01-03 in them. Any VMs not on hosts with these names are what I am interested in.

PowerCLI, VMware Tools update, etc.

(The following is based on this VMware KB article which is for ESXi 4.0 and earlier but can be made to work for later versions too).

In vSphere client we can see the VMware Tools related settings of a VM in the Options tab of the VM properties window. In PowerCLI these are exposed under the ExtensionData object. Specifically the ExtensionData.Config.Tools object.

The ExtensionData object has many methods and properties – think of it like the advanced options menu in a GUI. One of these methods is ReconfigVM() which takes an object of type VMware.Vim.VirtualMachineConfigSpec and reconfigures the VM accordingly.

So to take the example of modifying the VMware Tools update settings all one has to do is create a new object of the type above and pass it to the ReconfigVM() method. Something as below.

First we create an object of this type:

If we look at this object now we will see that it has various properties and methods. The Tools related settings are controlled by a property called Tools of type VMware.Vim.ToolsConfigInfo. To modify these we need to create a new object of that type:

This has no settings by default:

But we can set the properties we are interested in modifying.

For instance to set VMware Tools to be automatically updated upon power cycle do the following:

To undo that change set the value to “manual” (it only takes two options).

Here’s an example of me changing the VMware Tools updating settings to be manual.

So that’s it. Now to do this en-masse for a bunch of VMs you can make a loop.

If the list of VMs is got from vCenter directly (via say something like Get-VM | where {(Get-Cluster).Name -eq “CLUSTER NAME”}) then the code needs a bit of change (the $VMObj line can be removed).

Just as a reference to future me, the output returned by the ExtensionData object is what you would get via the Get-View cmdlet.

Update: Came across this while writing this post. If you have multiple vCenter servers and want PowerCLI to work against entities in all of them the following will help.

Enabling SNMPv3 on ESXi hosts

A continuation to my earlier post which was to do with SNMPv2.

As before, connect to the vCenter via PowerCLI. And as before the set() method can be used to set SNMP – both v2 and/or v3. The definition of this method is as follows:

That’s confusing so best to copy paste the definition into notepad or something so you can be sure you are passing the correct arguments.

First things first. There doesn’t seem to be a way of turning off something. As in, say you already have SNMPv2 turned on, you can’t turn it off by setting the community strings to blank. Doing so generates an error. So if you want to turn previous things off it’s best to do a reset and start with a clean slate.

This sets things back to their defaults:

Before going ahead with any SNMPv3 configuration we need to decide on what authentication and privacy protocols to use. In my case I want to use SHA1 and AES-128. So I need to set that first:

Once I have done this I can generate the hashes. I will need this later to configure SNMPv3.

In the example above both my passwords are Password1.

With this in hand I configure SNMPv3:

That’s it really. In the above example I will be using an SNMPv3 user called snmpUser1.

Now to do it across my estate I can make a loop. No need to create password hashes for each host. The hash stays the same as long as you are using the same password for each host.

That’s all!

Enabling SNMP on ESXi hosts

I wanted to enable SNMP on our ESXi hosts for monitoring via Solarwinds. Here’s what I did. (I am doing this kind of generically, using variables etc, so I can script the thing for multiple hosts).

First I connected to the vCenter Server from PowerCLI.

Next I got its ESXCLI object. This will let me run ESXCLI commands against the host.

To view the current status of SNMP you can do can invoke a get() method –

Nothing’s configured currently. To configure something we can use the set() method. From the definition of this method we can see it takes a whole bunch of parameters –

Here’s what I did to configure SNMP. I want a community string of “public”, enable SNMP, and specify two trap destinations.

The result of that will either be a true or false. The get() method can be used again to confirm it is set correctly. And the test() method can be used to test it works –

Now Solarwinds will be able to poll the host via SNMP.

To do this en-masse on all your hosts the following should help –

Shout out to this VMware blog post which helped a lot and has more info.

The above script failed on some of our ESX hosts with the following error –

Turns out these hosts only accept 16 parameters instead of 17 (the one called largestorage is missing). Not sure why. All our hosts are ESXi 5.5 but am thinking the problem ones are perhaps not using the HP customized version of ESXi.

Anyways, so I modified my script above to take care of this –

Also, just for my own info – the $null above means the parameter is not set. If that parameter already has a value on the server it is not over-written. To over-write or blank out the existing value replace $null with "".

Configure NTP for multiple ESXi hosts

Following on my previous post I wanted to set NTP servers for my ESX servers and also start the service & allow firewall exceptions. Here’s what I did –

 

User PowerShell/ PowerCLI to get VM space usage

Wanted to get the space used by all VMs across a bunch of our newer hosts –

There’s probably a way to show the total too but I used a separate pipeline for that –

 

VMware/ PowerCLI – find disks used by a template

It’s easy finding the disks used by a VM. Just check its settings via the client or use PowerCLI.

Can’t do the same for a template though as the Get-Template output has no similar property.

But then I came across Get-HardDisk:

Sweet! Same command works for templates (as above) and VMs:

That’s all. Hope this helps someone.

Enable & Disable SSH on ESXi host via PowerCLI

I alluded to this in another post but couldn’t find it when I was searching my posts for the cmdlet. So here’s a separate post.

The Get-VMHostService is your friend when dealing with services on ESXi hosts. You can use it to view the services thus:

To start and stop services we use the Start-VMHostService and Stop-VMHostService but these take (an array of) HostService objects.  HostService objects are what we get from the Get-VMHostService cmdlet above. Here’s how you stop the SSH & ESXi Shell services for instance:

Since the cmdlet takes an array, you can give it HostService objects of multiple hosts. Here’s how I start SSH & ESXi Shell for all hosts:

As an aside here’s a nice post on six different ways to enable SSH on a host. Good one!

Get ESXi host network info using PowerShell/ PowerCLI

Not an exhaustive post, I am still exploring this stuff.

To get a list of network adapters on a host:

To get a list of virtual switches on a host, with the NICs assigned to these:

To get a list of port groups on a host:

To get a list of port groups , the virtual switches they are mapped to, and the NICs that make up these switches:

This essentially combines the first and third cmdlets above.

More later!

Number of IPv4 routes did not match

Was creating / migrating some ESXi hosts during the week and came across the above error “Number of IPv4 routes did not match” when checking for host profile compliance of one of the hosts. All network settings of this host appeared to be same as the rest so I was stumped as to what could be wrong. Via a VMware KB article I came across the esxcfg-route command that helped identify the problem. To run this command SSH into the host:

By default the command only outputs the default gateway but you can pass it the -l switch to list all routes:

In my case the above output was from one of the hosts, while the following was from the non-compliant host:

Notice the vmk2 interface has the wrong network. Not sure how that happened. Oddly the GUI didn’t show this incorrect network but obviously something was corrupt somewhere.

To fix that I thought I’ll remove the vmk2 interface and re-add it. Big mistake! Possibly because its network was same as that of the management network (10.50.0.0/24) removing this interface caused the host to lose connectivity from vCenter. I could ping it but couldn’t connect to it via SSH, vSphere Client, or vCenter. Finally I had to reset the network via the DCUI – it’s under “Network Restore Options”. I tried “Restore vDS” first, didn’t help, so did a “Restore Standard Switch”. This is a very useful – it creates a new standard switch and moves the Management Network onto that so you get connectivity to the host. This way I was able to reconnect to the host, but now I stumbled upon a new problem.

The host didn’t have the vmk2 interface any more but when I tried to recreate it I got an error that the interface already exists. But no, it does not – the GUI has no trace of it! Some forum posts suggested restarting the vCenter service as that clears its cache and puts it in sync with the hosts but that didn’t help either. Then I came across this post which showed me that it is possible for the host to still have the VMkernel port but vCenter to not know of it. For this the esxcli command is your friend. To list all VMkernel ports on a host do the following:

After that, removing the VMkernel interface can be done by a variant of same command:

Now I could add the re-add the interface via vSphere and get the hosts into compliance.

Before I conclude this post though, a few notes on the commands above.

If you have PowerCLI installed you can run all the esxcli commands via the Get-EsxCli cmdlet. For example:

If I wanted to remove the interface via PowerCLI the command would be slightly different:

I would have written more on the esxcli command itself but this excellent blog post covers it all. It’s an all powerful command that can be used to manage many aspects of the ESXi host, even set it in maintenance mode!

Heck you can even use esxcli to upgrade from one ESXi version to another. It is also possible to run the esxcli command from a remote computer (Windows or Linux) by installing the vSphere CLI tools on that computer. Additionally, there’s also the vSphere Management Assistant (VMA) which is a virtual appliance that offers command line tools.

The esxcli is also useful if you want to kill a VM. For instance the following lists all running VMs on a host:

If that VM were stuck for some reason and cannot be stopped or restarted via vSphere it’s very useful to know the esxcli command can be used to kill the VM (has happened a couple of times to me in the past):

Regarding the type of killing you can do:

There are three types of VM kills that can be attempted: [soft, hard, force]. Users should always attempt ‘soft’ kills first, which will give the VMX process a chance to shutdown cleanly (like kill or kill -SIGTERM). If that does not work move to ‘hard’ kills which will shutdown the process immediately (like kill -9 or kill -SIGKILL). ‘force’ should be used as a last resort attempt to kill the VM. If all three fail then a reboot is required. (required)

Another command line option is vim-cmd which I stumbled upon from one of the links above. I haven’t used it much so as a reference to myself here’s a blog post explaining it in detail.

Lastly there’s also a bunch of esxcfg-* commands, one of whom we came across above.

I haven’t used these much. They seem to be present for compatibility reasons with ESXi 3.x and prior. Back then you had commands with a vicfg- prefix, now you have the same but with a esxcfg- prefix. For instance, esxcfg-vmknic is now replaced with esxcli network interface as we saw above.

That’s all for now!

Update: Thought I’d use this post to keep track of other useful commands.

To get IPv4 addresses details:

Replace with ipv6 if that’s what you want.

To set an IPv4 address:

To ping an address from the host:

Change keyboard layout:

Get current keyboard layout:

List available layouts:

Set a new layout:

Remotely enable SSH

The esxcli commands are cool but you need to enable SSH each time you want to connect to the host and run these (unless you install the CLI tools on your machine). If you have PowerCLI though you can enable SSH remotely.

To list the services:

To enable SSH and the ESXi shell:

 

VMware: “A specified parameter was not correct” error

Was trying to delete a VM template but it kept throwing the above error. I had a feeling this was because the underlying disk was missing in the datastore (because I couldn’t find any folder with the same name as the VM in the datastore) but there was no way to confirm this as you can’t right click a VM and note its settings.

Thanks to PowerCLI though, you can:

The Get-HardDisk cmdlet can be used to return the hard disks used by a VM or template. It can even be used to return all hard disks on a datastore (or in a specified path on the datastore):

 

PowerCLI – List all VMs in a cluster along with number of snapshots and space usage

More as a note to myself than anyone else, here’s a quick and dirty way to list all the VMs in a cluster with the number of snapshots, the used space, and the provisioned space. Yes you could get this information from the GUI but I like PowerShell and am trying to spend more time with PowerCLI.