Subscribe via Email

Subscribe via RSS


Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Reboot a bunch of ESXi hosts one after the other

Not a big deal, I know, but I felt like posting this. :)

Our HP Gen8 ESXi hosts were randomly crashing ever since we applied the latest ESXi 5.5 updates to them in December. Logged a call with HP and turns out until a proper fix is issued by VMware/ HPE we need to change a setting on all our hosts and reboot them. I didn’t want to do it manually, so I used PowerCLI to do it en masse.

Here’s the script I wrote to target Gen8 hosts and make the change:

I could have done the reboot along with this, but I didn’t want to. Instead I copy pasted the list of affected hosts into a text file (called ESXReboot.txt in the script below) and wrote another script to put them into maintenance mode and reboot one by one.

The screenshot output is slightly different from what you would get from the script as I modified it a bit since taking the screenshot. Functionality-wise there’s no change.

Downgrading ESXi Host

Today I upgraded one of our hosts to a newer version than what was supported by our vCenter so had to find a way of downgrading it. The host was now at “5.5 Patch 10” (which is after “5.5 Update 3”) which our vCenter version only supported versions prior to “5.5 Update 3”. (See this post for a list of build numbers and versions; see this KB article for why vCenter and the host were now incompatible).

I found this blog post and KB article that talked about downgrading and upgrading. Based on those two here’s what I did to downgrade my host.

First, some terminology. Read this blog post on what VIBs are. At a very high level a VIB file is like a zip file with some metadata and verification thrown in. They are the software packages for ESX (think of it like a .deb or .rpm file). The VIB file contains the actual files on the host that will be replaced. The metadata tells you more about the VIB file – its dependencies, requirements, issues, etc. And the verification bit lets the host verify that the VIB hasn’t been tampered with, and also allows you to have various “levels” of VIBs – those certified by VMware, those certified by partners of VMware, etc – such that you as a System Admin can decide what level of VIBs you want installed on your host.

You can install/ remove/ update VIBs via the command esxcli:

Here’s a short list of the VIBs installed on my host:

Next you have Image Profiles. These are a collection of VIBs. In fact, since any installation of ESXi is a collection of VIBs, an image profile can be thought of as defining an ESXi image. For instance, all the VIBs on my currently installed ESXi server – including 3rd party VIBs – together can be thought of as an image profile. I can then deploy this image profile to other hosts to get the exact configuration on those hosts too.

One thing to keep in mind is that image profiles are not anything tangible. As in they are not files as such, they just define the VIBs that make up the profile.

Lastly you have Software Depots. These are your equivalent of Linux package repositories. They contain VIBs and Image Profiles and are accessible online via HTTP/ HTTPS/ FTP or even offline as a ZIP file (which is a neat thing IMHO). You would point to a software depot – online or offline – and specify an image profile you want, which then pulls in the VIBs you want.

Now back to esxcli. As we saw above this command can be used to list, update, remove etc VIBs. The cool thing though is that it can work with both VIB files and software depots (either online or a ZIP file containing a bunch of VIB files). Here’s the usage for the software vib install command which deals with installing VIBs:

You have two options:

  • The -d switch can be used to specify a software depot (online or offline) along with the -n switch to specify the VIBs to be installed from this depot.
  • Or the -v switch can be used to directly specify VIBs to be installed.

The esxcli command can also work with image profiles.

Here you have just one option (coz like I said you can’t download something called an image profile – you have to necessarily use a software depot). You use the -d switch to specify a depot (online or offline) and the -p switch to specify the image profile you are interested in.

Apart from installing VIBs & image profiles, the esxcli command can also remove and update these. When it comes to image profiles though, the command can also downgrade profiles via an --allow-downgrades switch. So that’s what we use to downgrade ESXi versions. 

First find the ESXi version you want to downgrade to. In my case it was ESXi 5.5 Update 2. Go to My VMware (login with your account) and find the 5.5 Update 2 product. Download the offline bundle – which is a ZIP file (basically an offline software depot). In my case I got a file named “”. Now open this ZIP file and go to the “\profiles” folder in that. This gives you the list of profiles in this depot.


You can also get the names from a link such as this which gives more info on the release and the image profiles in it. (I came across it by Googling for “ESXi 5.5 Update 2 profile name”).

The profiles with an “s” in them only contain security fixes while the ones without an “s” contain both security and bug fixes. In my case the profile I am looking for is “ESXi-5.5.0-20140902001-standard”. I wasn’t sure if I need to go for the “no-tools” version or not, but figured I’ll stick with the “standard”.

Now, copy the ZIP file you downloaded to the host. Either upload it to the host directly, or to some shared storage, etc.

Then run a command similar to this:

That’s it! Following a host reboot you are now downgraded. Very straight-forward and easy.

ESXi host – cannot install HA – no space left on device

These are less of notes and more of links and what I did when I encountered this issue. Just for my future self.

At work we had a host which was giving HA errors. The message was along the lines that vCenter could not contact HA. So I tried reconfiguring it for HA (right click the host and select “Reconfigure for vSphere HA”) upon which I got a new error: Cannot install the vCenter Server agent service. Cannot upload agent.

HA-errorInitially I thought it must just be a permissions issue. But it wasn’t so.

To investigate further I tried logging on to the server. I couldn’t enable SSH and ESXi Shell from the Configuration tab – it gave me an error. So I iLO’d into the server DCUI and enabled SSH and ESXi Shell. SSH still refused to let me in, and when I’d press Alt+F1 on the console to get the login prompt it was filled with messages like these: /bin/sh cant fork. Initially I thought it might be to do with HP AMS memory leak (see this and this) but it wasn’t.

I pressed Alt+F12 to see the on-screen logs. It was filled with messages like these:

alt+f12 logsBlimey!

There was nothing more I could do here basically. Couldn’t login to the server at all, heck I couldn’t even Shutdown/ Restart it gracefully via F12 in DCUI (nothing would happen). So I cold booted it and that got it working. 

It’s been about 2 hours since I did that and the server seems stable so maybe it was a one off-thing. I looked at more logs though and here’s what I found.


(Contains: Management service initialization, watchdogs, scheduled tasks and DCUI use)


(Contains: A summary of Warning and Alert log messages excerpted from the VMkernel logs)


(Contains: VMkernel Observation events)


(Contains: Core VMkernel logs, including device discovery, storage and networking device and driver events, and virtual machine startup)


(Contains: Host management service logs, including virtual machine and host Task and Events, communication with the vSphere Client and vCenter Server vpxa agent, and SDK connections.)

From these logs one thing was clear. The ESXi RAMdisk hosting the root filesystem had run out of inodes. Possibly caused by the SFCB service. Because of this the root filesystem had run out of space and everything was failing. Great!

In Linux I am used to the df command to check filesystem usage. But in ESXi df only seems to be give info on the mounted filesystems whereas vdf gives the local filesystems (like RAMdisks and Tardisks (whatever that is)).

Above output is after a reboot and all seems fine. To check the inode usage use the stat command.

Or use exscli. It gives you the free space as well as the inode count!

Note to self: Make a habit of using the esxcli command as that seems to be the VMware preferred way of doing things. Plus it’s one command with various namespaces you can use for networking and other info.

In my case things look to be fine now.

KB 2037798 talks about this problem. Apparently it is fixed via a patch released in 2013, and as far as I can tell we are properly patched so we shouldn’t have been hit by this issue. If it happens again though the same KB article talks about creating a separate RAMdisk for SFCB so even if it eats up all the inodes your root file system isn’t affected. This involves creating a new RAMdisk at boot time by modifying rc.local (nice!). The esxcli command can be used to create a new ramdisk and mount it at the mount point required by SFCB:

Turns out such an issue can also occur because of SNMP. Or if you have an HP Gen8 blade server then coz of the hpHelper.log file, which is fixed via a patch from HP (this server was a Gen8 blade but it didn’t have this log file). KB 2040707 too talks about this. Didn’t help much in my case as that didn’t seem to be my issue.

Two useful links for future reference are:

That’s all for now.

p.s. I keep talking about SFCB above but have no idea what it is. Turns out it is the CIM server for ESXi. Found this blog post on it. 

Number of IPv4 routes did not match

Was creating / migrating some ESXi hosts during the week and came across the above error “Number of IPv4 routes did not match” when checking for host profile compliance of one of the hosts. All network settings of this host appeared to be same as the rest so I was stumped as to what could be wrong. Via a VMware KB article I came across the esxcfg-route command that helped identify the problem. To run this command SSH into the host:

By default the command only outputs the default gateway but you can pass it the -l switch to list all routes:

In my case the above output was from one of the hosts, while the following was from the non-compliant host:

Notice the vmk2 interface has the wrong network. Not sure how that happened. Oddly the GUI didn’t show this incorrect network but obviously something was corrupt somewhere.

To fix that I thought I’ll remove the vmk2 interface and re-add it. Big mistake! Possibly because its network was same as that of the management network ( removing this interface caused the host to lose connectivity from vCenter. I could ping it but couldn’t connect to it via SSH, vSphere Client, or vCenter. Finally I had to reset the network via the DCUI – it’s under “Network Restore Options”. I tried “Restore vDS” first, didn’t help, so did a “Restore Standard Switch”. This is a very useful – it creates a new standard switch and moves the Management Network onto that so you get connectivity to the host. This way I was able to reconnect to the host, but now I stumbled upon a new problem.

The host didn’t have the vmk2 interface any more but when I tried to recreate it I got an error that the interface already exists. But no, it does not – the GUI has no trace of it! Some forum posts suggested restarting the vCenter service as that clears its cache and puts it in sync with the hosts but that didn’t help either. Then I came across this post which showed me that it is possible for the host to still have the VMkernel port but vCenter to not know of it. For this the esxcli command is your friend. To list all VMkernel ports on a host do the following:

After that, removing the VMkernel interface can be done by a variant of same command:

Now I could add the re-add the interface via vSphere and get the hosts into compliance.

Before I conclude this post though, a few notes on the commands above.

If you have PowerCLI installed you can run all the esxcli commands via the Get-EsxCli cmdlet. For example:

If I wanted to remove the interface via PowerCLI the command would be slightly different:

I would have written more on the esxcli command itself but this excellent blog post covers it all. It’s an all powerful command that can be used to manage many aspects of the ESXi host, even set it in maintenance mode!

Heck you can even use esxcli to upgrade from one ESXi version to another. It is also possible to run the esxcli command from a remote computer (Windows or Linux) by installing the vSphere CLI tools on that computer. Additionally, there’s also the vSphere Management Assistant (VMA) which is a virtual appliance that offers command line tools.

The esxcli is also useful if you want to kill a VM. For instance the following lists all running VMs on a host:

If that VM were stuck for some reason and cannot be stopped or restarted via vSphere it’s very useful to know the esxcli command can be used to kill the VM (has happened a couple of times to me in the past):

Regarding the type of killing you can do:

There are three types of VM kills that can be attempted: [soft, hard, force]. Users should always attempt ‘soft’ kills first, which will give the VMX process a chance to shutdown cleanly (like kill or kill -SIGTERM). If that does not work move to ‘hard’ kills which will shutdown the process immediately (like kill -9 or kill -SIGKILL). ‘force’ should be used as a last resort attempt to kill the VM. If all three fail then a reboot is required. (required)

Another command line option is vim-cmd which I stumbled upon from one of the links above. I haven’t used it much so as a reference to myself here’s a blog post explaining it in detail.

Lastly there’s also a bunch of esxcfg-* commands, one of whom we came across above.

I haven’t used these much. They seem to be present for compatibility reasons with ESXi 3.x and prior. Back then you had commands with a vicfg- prefix, now you have the same but with a esxcfg- prefix. For instance, esxcfg-vmknic is now replaced with esxcli network interface as we saw above.

That’s all for now!

Update: Thought I’d use this post to keep track of other useful commands.

To get IPv4 addresses details:

Replace with ipv6 if that’s what you want.

To set an IPv4 address:

To ping an address from the host:

Change keyboard layout:

Get current keyboard layout:

List available layouts:

Set a new layout:

Remotely enable SSH

The esxcli commands are cool but you need to enable SSH each time you want to connect to the host and run these (unless you install the CLI tools on your machine). If you have PowerCLI though you can enable SSH remotely.

To list the services:

To enable SSH and the ESXi shell: