 © Rakhesh Sasidharan
|
Posted: January 21st, 2019 | Tags: esxcli, storage, vmware | Category: Virtualization | § Made the following script yesterday to run on all our ESXi hosts to apply HPE recommended settings for 3Par.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
#!/bin/sh # Set VMWare recommended path settings for our datastores # https://h20195.www2.hpe.com/V2/getpdf.aspx/4AA4-3286ENW.pdf # Note: this only targets devices of identifier naa.6000xxxx. Modify this later if required. for i in `esxcli storage nmp device list | grep '^naa.6000'` ; do echo "Setting RR for $i" esxcli storage nmp device set --device $i --psp VMW_PSP_RR; echo "Setting IOPS=1 for $i" esxcli storage nmp psp roundrobin deviceconfig set --type=iops --iops=1 --device $i; echo "Setting queue size for $i" esxcli storage core device set --device $i --queue-full-threshold 4 --queue-full-sample-size 32; echo "" done echo "Setting as default for future" esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -P "VMW_PSP_RR" -O "iops=1" -c "tpgs_on" -V "3PARData" -M "VV" -e "HPE 3PAR Custom Rule" |
No original contribution from my side here. It’s just something I put together from stuff found elsewhere.
I wanted to run this as a cron job on ESXi periodically but apparently that’s not straightforward. (Wanted to run it as a cron job because I am not sure the queue settings too can be set as a default for new datastores). ESXi doesn’t keep cron jobs over reboots so you to modify some other script to inject a new crontab each time the host reboots. I was too lazy to do that.
Another plan was to try and run this via PowerCLI as I had to do this in a whole bunch of hosts. I was too lazy to do that either and PowerCLI seems a kludgy way to run esxcli commands. Finally I resorted to plink (SSH was already enabled on all the hosts) to run this en-masse:
|
Get-Content .\esxlist.txt | %{ $entry = $_ -split ": " ; $esxhost = $entry[0]; $pass = $entry[1]; "Processing $esxhost"; .\plink.exe -pw $pass root@$esxhost /vmfs/volumes/path/to/fixPSP.sh } |
This feels like cheating, I know. This requires SSH to be enabled on all hosts. This assumes I have put the script in some common datastore accessible across all hosts. I am using PowerShell purely to loop and read the contents of a text file consisting of “hostname: password” entries. And I am using plink to connect to each host and run the script. (I love plink for this kind of stuff. It’s super cool!) It feels like a hotch-potch of so many different things and not very elegant but lazy. (Something like this would be elegant. Using PowerCLI properly; not just as a wrapper to run esxcli commands. But I couldn’t figure out the equivalent commands for my case. I was using FC rather the SCSI).
Posted: November 12th, 2017 | Tags: esxcli, esxi, FCoE, vmware | Category: Virtualization | § (This post is intentionally brief because I don’t want to sidetrack by talking more on the things I link to. I am trying to clear my browser tabs by making blog posts on what’s open, so I want to focus on just getting stuff posted. :)
At work we are moving HPE Synergy now. We have two Synergy 12000 frames with each frame containing a Virtual Connect SE 40Gb F8 Module for Synergy. The two frames are linked via Synergy 20Gb Interconnect Link Module(s). (Synergy has a master/ satellite module for the Virtual Connect modules so you don’t need a Virtual Connect module per frame (or enclosure as it used to be in the past)). The frames have SY 480 Gen 10 compute modules, running ESXi 6.5, and the mezzanine slot of each compute module has a Synergy 3820C 10/20Gb CNA module. The OS in the compute modules should see up to 4 FlexNIC or FlexHBA adapters per Virtual Connect module.
The FlexHBA adapters are actually FCoE adapters (they provide FCoE and/ or iSCSI actually). By default these FlexHBA adapters are not listed as storage adapters in ESXi so one has to follow the instructions in this link. Basically:
1) Determine the vmnic IDs of the FCoE adapters:
2) Then do a discovery to activate FCoE:
|
esxcli fcoe nic discover –n vmnicX |
As a reference to my future self, here’s a blog post on how to do this automatically for stateless installs.
Totally unrelated to the above, but something I had found while Googling on this issue: Implementing Multi-Chassis Link Aggregation Groups (MC-LAG) with HPE Synergy Virtual Connect SE 40Gb F8 Module and Arista 7050 Series Switches. A good read.
Also, two good blog posts on Synergy:
Posted: November 12th, 2017 | Tags: esxcli, storage, vmware | Category: Asides, Virtualization | §
Had to spend some time recently identifying the attached storage devices and adapters to an ESXi box and the above links were handy. Thought I should put them in here as a reference to myself.
Posted: August 1st, 2017 | Tags: bash, esxcli, loop, unmap | Category: Virtualization | § It’s a good idea to periodically run the UNMAP command on all your thin-provisioned LUNs. This allows the storage system to reclaim deleted blocks. (What is SCSI UNMAP?)
The format of the command is:
|
esxcli storage vmfs unmap -l <volume_label> |
I wanted to make a script to run this on all attached datastores so here’s what I came up with:
|
#!/bin/sh for datastore in `esxcli storage filesystem list | grep "VMDATA" | awk -F " " '{print $2}'`; do echo "Performing UNMAP on $datastore ..." esxcli storage vmfs unmap -l $datastore done |
The esxcli storage filesystem list command outputs a list of datastores attached to the system. The second column is what I am interested in, so that’s what awk takes care for me. I don’t want to target any local datastores, so I use grep to filter out the ones I am interested in.
Next step would be to add this to a cron job. Got to follow the instructions here, it looks like.
I am putting a link to the official VMware documentation on this as I Googled it just to confirm to myself I am not doing anything wrong! What I need to do is migrate the physical NICs and Management/ VM Network VMkernel NIC from a standard switch to a distributed switch. Process is simple and straight-forward, and one that I have done numerous times; yet it fails for me now!
Here’s a copy paste from the documentation:
- Navigate to Home > Inventory > Networking.
- Right-click the dVswitch.
- If the host is already added to the dVswitch, click Manage Hosts, else Click Add Host.
- Select the host(s), click Next.
- Select the physical adapters ( vmnic) to use for the vmkernel, click Next.
- Select the Virtual adapter ( vmk) to migrate and click Destination port group field. For each adapter, select the correct port group from dropdown, Click Next.
- Click Next to omit virtual machine networking migration.
- Click Finish after reviewing the new vmkernel and Uplink assignment.
- The wizard and the job completes moving both the vmk interface and the vmnic to the dVswitch.
Basically add physical NICs to the distributed switch & migrate vmk NICs as part of the process. For good measure I usually migrate only one physical NIC from the standard switch to the distributed switch, and then separately migrate the vmk NICs.
Here’s what happens when I am doing the above now. (Note: now. I never had an issue with this earlier. Am guessing it must be some bug in a newer 5.5 update, or something’s wrong in the underlying network at my firm. I don’t think it’s the networking coz I got my network admins to take a look, and I tested that all NICs on the host have connectivity to the outside world (did this by making each NIC the active one and disabling the others)).
First it’s stuck in progress:

And then vCenter cannot see the host any more:

Oddly I can still ping the host on the vmk NIC IP address. However I can’t SSH into it, so the Management bits are what seem to be down. The host has connectivity to the outside world because it passes the Management network tests from DCUI (which I can connect to via iLO). I restarted the Management agents too, but nope – cannot SSH or get vCenter to see the host. Something in the migration step breaks things. Only solution is to reboot and then vCenter can see the host.
Here’s what I did to workaround anyways.
First I moved one physical NIC to the distributed switch.
Then I created a new management portgroup and VMkernel NIC on that for management traffic. Assigned it a temporary IP.
Next I opened a console to the host. Here’s the current config on the host:
|
~ # esxcli network ip interface ipv4 get Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address Type DHCP DNS ---- ------------ ------------- -------------- ------------ -------- vmk0 10.xxx.xx.30 255.255.255.0 10.xxx.xx.255 STATIC false vmk1 10.xxx.xx.24 255.255.255.0 10.xxx.xx.255 STATIC false vmk2 10.xxx.xx.25 255.255.255.0 10.xxx.xx.255 STATIC false vmk3 1.1.1.1 255.255.255.0 1.1.1.255 STATIC false vmk4 10.xxx.xx.23 255.255.255.0 10.xxx.xx.255 STATIC false |
The interface vmk0 (or its IPv4 address rather) is what I wanted to migrate. The interface vmk4 is what I created temporarily.
I now removed the IPv4 address of the existing vmk NIC and assigned that to the new one. Also, confirmed the changes just to be sure. As soon as I did so vCenter picked up the changes. I then tried to move the remaining physical NIC over to the distributed switch, but that failed. Gave an error that the existing connection was forcibly closed by the host. So I rebooted the host. Post-reboot I found that the host now thought it had no IP, even though it was responding to the old IP via the new vmk. So this approach was a no-go (but still leaving it here as a reminder to myself that this does not work).
I now migrated vmk0 from the standard switch to the distributed switch. As before, this will fail – vCenter will lose connectivity to the ESX host. But that’s why I have a console open. As expected the output of esxcli network ip interface list shows me that vmk0 hasn’t moved to the distributed switch:

So now I go ahead and remove the IPv4 address of vmk0 and assign that to vmk4 (the new one). Also confirmed the changes.
|
~ # esxcli network ip interface ipv4 set -i vmk0 -t none ~ # esxcli network ip interface ipv4 set -i vmk4 -I 10.xxx.xx.30 -N 255.255.255.0 -t static ~ # esxcli network ip interface ipv4 get |
Next I rebooted (reboot ) the host, and via the CLI I removed vmk0 (for some reason the GUI showed both vmk0 and vmk4 with the same IP I assigned above).
|
~ # esxcli network ip interface remove --interface-name vmk0 |
Reboot again!
Post-reboot I can go back to the GUI and move the remaining physical NIC over to the distributed switch. :) Yay!
Posted: March 26th, 2017 | Tags: esxcli, esxi, updates, vmware | Category: Virtualization | § I use esxcli to manually update our ESXi hosts that don’t have access to VUM (e.g. our DMZ hosts). I do so via command-line:
|
esxcli software profile update -d /absolute/path/to/zip/file -p name-of-profile |
Usually the VMware page where I download the patch from mentions the profile name, but today I had a patch file and wanted to find the list of profiles it had.
One way is to open the zip file, then the metadata.zip file in that, and that should contain a list of profiles. Another way is to use esxcli :
|
esxcli software sources profile list -d /absolute/path/to/zip/file |
Screenshot example:

Not a big deal, I know, but I felt like posting this. :)

Our HP Gen8 ESXi hosts were randomly crashing ever since we applied the latest ESXi 5.5 updates to them in December. Logged a call with HP and turns out until a proper fix is issued by VMware/ HPE we need to change a setting on all our hosts and reboot them. I didn’t want to do it manually, so I used PowerCLI to do it en masse.
Here’s the script I wrote to target Gen8 hosts and make the change:
|
$Hosts = Get-Cluster | Get-VMHost | ?{ $_.Model -match "Gen8$" } foreach ($VMHost in $Hosts) { Write-Host "$VMHost" $ESXCli = Get-ESXCli -VMHost $VMHost if (($ESXCli.system.settings.kernel.list() | ?{ $_.Name -eq "iovDisableIR" }).Configured -eq "TRUE") { $ESXCli.system.settings.kernel.set("iovDisableIR", "FALSE") } } |
I could have done the reboot along with this, but I didn’t want to. Instead I copy pasted the list of affected hosts into a text file (called ESXReboot.txt in the script below) and wrote another script to put them into maintenance mode and reboot one by one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
Get-Content .\ESXReboot.txt | %{ $Server = $_; Write-Host -ForegroundColor Green "$Server" Write-Host -ForegroundColor Yellow "`tEntering maintenance mode" Set-VMHost $Server -State maintenance -Evacuate | Out-Null Write-Host -ForegroundColor Yellow -NoNewline "`tRebooting" Restart-VMHost $Server -Confirm:$false | Out-Null do { sleep 15 $ServerState = (Get-VMHost $Server).ConnectionState Write-Host -ForegroundColor Yellow -NoNewline "." } while ($ServerState -ne "NotResponding") Write-Host -ForegroundColor Yellow -NoNewline "(down)" do { sleep 30 $ServerState = (Get-VMHost $Server).ConnectionState Write-Host -ForegroundColor Yellow -NoNewline "`." } while ($ServerState -ne "Maintenance") Write-Host -ForegroundColor Yellow "(up)" Write-Host -ForegroundColor Yellow "`tExiting maintenance mode" Set-VMHost $Server -State Connected | Out-Null Write-Host -ForegroundColor Yellow "`tDone!" Write-Host "" } |
The screenshot output is slightly different from what you would get from the script as I modified it a bit since taking the screenshot. Functionality-wise there’s no change.
Posted: March 30th, 2016 | Tags: esx, esxcli, esxi, vmware | Category: Virtualization | § Today I upgraded one of our hosts to a newer version than what was supported by our vCenter so had to find a way of downgrading it. The host was now at “5.5 Patch 10” (which is after “5.5 Update 3”) which our vCenter version only supported versions prior to “5.5 Update 3”. (See this post for a list of build numbers and versions; see this KB article for why vCenter and the host were now incompatible).
I found this blog post and KB article that talked about downgrading and upgrading. Based on those two here’s what I did to downgrade my host.
First, some terminology. Read this blog post on what VIBs are. At a very high level a VIB file is like a zip file with some metadata and verification thrown in. They are the software packages for ESX (think of it like a .deb or .rpm file). The VIB file contains the actual files on the host that will be replaced. The metadata tells you more about the VIB file – its dependencies, requirements, issues, etc. And the verification bit lets the host verify that the VIB hasn’t been tampered with, and also allows you to have various “levels” of VIBs – those certified by VMware, those certified by partners of VMware, etc – such that you as a System Admin can decide what level of VIBs you want installed on your host.
You can install/ remove/ update VIBs via the command esxcli :
|
# esxcli software vib Usage: esxcli software vib {cmd} [cmd options] Available Commands: get Displays detailed information about one or more installed VIBs install Installs VIB packages from a URL or depot. VIBs may be installed, upgraded, or downgraded. WARNING: If your installation requires a reboot, you need to disable HA first. list Lists the installed VIB packages remove Removes VIB packages from the host. WARNING: If your installation requires a reboot, you need to disable HA first. update Update installed VIBs to newer VIB packages. No new VIBs will be installed, only updates. WARNING: If your installation requires a reboot, you need to disable HA first. |
Here’s a short list of the VIBs installed on my host:
|
# esxcli software vib list Name Version Vendor Acceptance Level Install Date ----------------------------- ------------------------------------- --------------- ---------------- ------------ net-tg3 3.134e.v55.1-1OEM.550.0.0.1331820 Broadcom VMwareCertified 2015-01-29 scsi-bfa 3.2.3.0-1OEM.550.0.0.1198610 Brocade VMwareCertified 2015-01-29 ima-be2iscsi 4.9.303.0-1OEM.550.0.0.1331820 Emulex VMwareCertified 2015-01-29 lpfc 10.0.725.203-1OEM.550.0.0.1331820 Emulex VMwareCertified 2015-01-29 scsi-be2iscsi 4.9.303.0-1OEM.550.0.0.1331820 Emulex VMwareCertified 2015-01-29 |
Next you have Image Profiles. These are a collection of VIBs. In fact, since any installation of ESXi is a collection of VIBs, an image profile can be thought of as defining an ESXi image. For instance, all the VIBs on my currently installed ESXi server – including 3rd party VIBs – together can be thought of as an image profile. I can then deploy this image profile to other hosts to get the exact configuration on those hosts too.
One thing to keep in mind is that image profiles are not anything tangible. As in they are not files as such, they just define the VIBs that make up the profile.
Lastly you have Software Depots. These are your equivalent of Linux package repositories. They contain VIBs and Image Profiles and are accessible online via HTTP/ HTTPS/ FTP or even offline as a ZIP file (which is a neat thing IMHO). You would point to a software depot – online or offline – and specify an image profile you want, which then pulls in the VIBs you want.
Now back to esxcli . As we saw above this command can be used to list, update, remove etc VIBs. The cool thing though is that it can work with both VIB files and software depots (either online or a ZIP file containing a bunch of VIB files). Here’s the usage for the software vib install command which deals with installing VIBs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
Usage: esxcli software vib install [cmd options] Description: install Installs VIB packages from a URL or depot. VIBs may be installed, upgraded, or downgraded. WARNING: If your installation requires a reboot, you need to disable HA first. Cmd options: -d|--depot=[ <str> ... ] Specifies full remote URLs of the depot index.xml or server file path pointing to an offline bundle .zip file. --dry-run Performs a dry-run only. Report the VIB-level operations that would be performed, but do not change anything in the system. -f|--force Bypasses checks for package dependencies, conflicts, obsolescence, and acceptance levels. Really not recommended unless you know what you are doing. Use of this option will result in a warning being displayed in the vSphere Client. --maintenance-mode Pretends that maintenance mode is in effect. Otherwise, installation will stop for live installs that require maintenance mode. This flag has no effect for reboot required remediations. --no-live-install Forces an install to /altbootbank even if the VIBs are eligible for live installation or removal. Will cause installation to be skipped on PXE-booted hosts. --no-sig-check Bypasses acceptance level verification, including signing. Use of this option poses a large security risk and will result in a SECURITY ALERT warning being displayed in the vSphere Client. --proxy=<str> Specifies a proxy server to use for HTTP, FTP, and HTTPS connections. The format is proxy-url:port. -n|--vibname=[ <str> ... ] Specifies VIBs from a depot, using one of the following forms: name, name:version, vendor:name, or vendor:name:version. -v|--viburl=[ <str> ... ] Specifies one or more URLs to VIB packages to install. http:, https:, ftp:, and file: are all supported. |
You have two options:
- The
-d switch can be used to specify a software depot (online or offline) along with the -n switch to specify the VIBs to be installed from this depot.
- Or the
-v switch can be used to directly specify VIBs to be installed.
The esxcli command can also work with image profiles.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
Usage: esxcli software profile install [cmd options] Description: install Installs or applies an image profile from a depot to this host. This command completely replaces the installed image with the image defined by the new image profile, and may result in the loss of installed VIBs. The common vibs between host and image profile will be skipped. To preserve installed VIBs, use profile update instead. WARNING: If your installation requires a reboot, you need to disable HA first. Cmd options: -d|--depot=[ <str> ... ] Specifies full remote URLs of the depot index.xml or server file path pointing to an offline bundle .zip file. (required) --dry-run Performs a dry-run only. Report the VIB-level operations that would be performed, but do not change anything in the system. -f|--force Bypasses checks for package dependencies, conflicts, obsolescence, and acceptance levels. Really not recommended unless you know what you are doing. Use of this option will result in a warning being displayed in the vSphere Client. --maintenance-mode Pretends that maintenance mode is in effect. Otherwise, installation will stop for live installs that require maintenance mode. This flag has no effect for reboot required remediations. --no-live-install Forces an install to /altbootbank even if the VIBs are eligible for live installation or removal. Will cause installation to be skipped on PXE-booted hosts. --no-sig-check Bypasses acceptance level verification, including signing. Use of this option poses a large security risk and will result in a SECURITY ALERT warning being displayed in the vSphere Client. --ok-to-remove Allows the removal of installed VIBs as part of applying the image profile. If not specified, esxcli will error out if applying the image profile results in the removal of installed VIBs. -p|--profile=<str> Specifies the name of the image profile to install. (required) --proxy=<str> Specifies a proxy server to use for HTTP, FTP, and HTTPS connections. The format is proxy-url:port. |
Here you have just one option (coz like I said you can’t download something called an image profile – you have to necessarily use a software depot). You use the -d switch to specify a depot (online or offline) and the -p switch to specify the image profile you are interested in.
Apart from installing VIBs & image profiles, the esxcli command can also remove and update these. When it comes to image profiles though, the command can also downgrade profiles via an --allow-downgrades switch. So that’s what we use to downgrade ESXi versions.
First find the ESXi version you want to downgrade to. In my case it was ESXi 5.5 Update 2. Go to My VMware (login with your account) and find the 5.5 Update 2 product. Download the offline bundle – which is a ZIP file (basically an offline software depot). In my case I got a file named “update-from-esxi5.5-5.5_update02-2068190.zip”. Now open this ZIP file and go to the “metadata.zip\profiles” folder in that. This gives you the list of profiles in this depot.

You can also get the names from a link such as this which gives more info on the release and the image profiles in it. (I came across it by Googling for “ESXi 5.5 Update 2 profile name”).
The profiles with an “s” in them only contain security fixes while the ones without an “s” contain both security and bug fixes. In my case the profile I am looking for is “ESXi-5.5.0-20140902001-standard”. I wasn’t sure if I need to go for the “no-tools” version or not, but figured I’ll stick with the “standard”.
Now, copy the ZIP file you downloaded to the host. Either upload it to the host directly, or to some shared storage, etc.
Then run a command similar to this:
|
esxcli software profile update --allow-downgrades -d /path/to/update-from-esxi5.5-5.5_update02.zip -p ESXi-5.5.0-20140902001-standard |
That’s it! Following a host reboot you are now downgraded. Very straight-forward and easy.
These are less of notes and more of links and what I did when I encountered this issue. Just for my future self.
At work we had a host which was giving HA errors. The message was along the lines that vCenter could not contact HA. So I tried reconfiguring it for HA (right click the host and select “Reconfigure for vSphere HA”) upon which I got a new error: Cannot install the vCenter Server agent service. Cannot upload agent .
Initially I thought it must just be a permissions issue. But it wasn’t so.
To investigate further I tried logging on to the server. I couldn’t enable SSH and ESXi Shell from the Configuration tab – it gave me an error. So I iLO’d into the server DCUI and enabled SSH and ESXi Shell. SSH still refused to let me in, and when I’d press Alt+F1 on the console to get the login prompt it was filled with messages like these: /bin/sh cant fork . Initially I thought it might be to do with HP AMS memory leak (see this and this) but it wasn’t.
I pressed Alt+F12 to see the on-screen logs. It was filled with messages like these:
Blimey!
There was nothing more I could do here basically. Couldn’t login to the server at all, heck I couldn’t even Shutdown/ Restart it gracefully via F12 in DCUI (nothing would happen). So I cold booted it and that got it working.
It’s been about 2 hours since I did that and the server seems stable so maybe it was a one off-thing. I looked at more logs though and here’s what I found.
/var/log/syslog.log
(Contains: Management service initialization, watchdogs, scheduled tasks and DCUI use)
|
2015-08-14T08:37:39Z sfcb-CIMXML-Processor[22385291]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:40Z sfcb-CIMXML-Processor[22385292]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:40Z sfcb-CIMXML-Processor[22385293]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:40Z sfcb-CIMXML-Processor[22385294]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385295]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385296]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385297]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385298]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385299]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:42Z sfcb-CIMXML-Processor[22352532]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' |
/var/log/vmkwarning.log
(Contains: A summary of Warning and Alert log messages excerpted from the VMkernel logs)
|
2015-08-13T19:56:19.608Z cpu2:22382164)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/527fb83b-7c0b-4fe2-0152-d81fb0bac853 for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-13T20:00:14.737Z cpu4:34191 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-52f258cf-a87b-e1 for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-13T20:04:46.110Z cpu30:34194 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-52c87856-17ee-61 for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-13T20:09:17.481Z cpu3:36506 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-529ddabc-6196-dd for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-13T20:13:48.849Z cpu11:7960868 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-5278454b-65e6-1d for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-13T20:15:53.301Z cpu6:21329945)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-7f09012d-0b29-44 for process cimslp because the inode table of its ramdisk (root) is full. 2015-08-13T20:16:48.853Z cpu12:35008 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-5257e7ba-7c96-d0 for process hostd-worker because the inode table of its ramdisk (root) is full. |
/var/log/vob.log
(Contains: VMkernel Observation events)
|
2015-08-17T00:15:19.220Z: [VisorfsCorrelator] 17133398447519us: [vob.visorfs.ramdisk.inodetable.full] Cannot create file /var/run/vmware/tickets/vmtck-52b5db61-d61e-8d for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-17T00:15:19.220Z: [VisorfsCorrelator] 17133319127883us: [esx.problem.visorfs.ramdisk.inodetable.full] The file table of the ramdisk 'root' is full. As a result, the file /var/run/vmware/tickets/vmtck-52b5db61-d61e-8d could not be created by the application 'hostd-worker'. 2015-08-17T00:21:20.587Z: [VisorfsCorrelator] 17133759815799us: [vob.visorfs.ramdisk.inodetable.full] Cannot create file /var/run/vmware/tickets/vmtck-52ac40ae-4240-e3 for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-17T00:21:20.587Z: [VisorfsCorrelator] 17133680494786us: [esx.problem.visorfs.ramdisk.inodetable.full] The file table of the ramdisk 'root' is full. As a result, the file /var/run/vmware/tickets/vmtck-52ac40ae-4240-e3 could not be created by the application 'hostd-worker'. 2015-08-17T00:25:51.966Z: [VisorfsCorrelator] 17134031195582us: [vob.visorfs.ramdisk.inodetable.full] Cannot create file /var/run/vmware/tickets/vmtck-520e1b5c-35f2-21 for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-17T00:25:51.966Z: [VisorfsCorrelator] 17133951873623us: [esx.problem.visorfs.ramdisk.inodetable.full] The file table of the ramdisk 'root' is full. As a result, the file /var/run/vmware/tickets/vmtck-520e1b5c-35f2-21 could not be created by the application 'hostd-worker'. 2015-08-17T00:30:23.342Z: [VisorfsCorrelator] 17134302572394us: [vob.visorfs.ramdisk.inodetable.full] Cannot create file /var/run/vmware/tickets/vmtck-52b93f74-9429-59 for process hostd-worker because the inode table of its ramdisk (root) is full. |
/var/log/vmkernel.log
(Contains: Core VMkernel logs, including device discovery, storage and networking device and driver events, and virtual machine startup)
|
2015-08-17T01:22:09.441Z cpu30:22401956)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:09.740Z cpu27:22401957)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:09.915Z cpu11:7960868)World: 14299: VC opID hostd-b90c maps to vmkernel opID 576afc9e 2015-08-17T01:22:10.060Z cpu13:22401959)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:10.367Z cpu9:22401960)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:10.629Z cpu26:22401962)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:10.869Z cpu22:22401963)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:11.113Z cpu25:22401966)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:11.359Z cpu17:22401968)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:11.668Z cpu12:22401970)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. |
/var/log/hostd.log
(Contains: Host management service logs, including virtual machine and host Task and Events, communication with the vSphere Client and vCenter Server vpxa agent, and SDK connections.)
|
2015-08-15T09:11:54.083Z [2B781B70 info 'Vimsvc.ha-eventmgr'] Event 9898 : The file table of the ramdisk 'root' is full. As a result, the file /var/run/vmware/tickets/vmtck-52533095-569d-5c could not be created by the application 'hostd-worker'. 2015-08-15T09:11:54.087Z [FF95EB70 info 'Hostsvc' opID=hostd-6801] VsanSystemVmkProvider : GetRuntimeInfo: Start 2015-08-15T09:11:54.087Z [FF95EB70 info 'Hostsvc' opID=hostd-6801] VsanSystemVmkProvider : GetRuntimeInfo: Complete, runtime info: (vim.vsan.host.VsanRuntimeInfo) { --> dynamicType = <unset>, --> accessGenNo = 0, --> |
From these logs one thing was clear. The ESXi RAMdisk hosting the root filesystem had run out of inodes. Possibly caused by the SFCB service. Because of this the root filesystem had run out of space and everything was failing. Great!
In Linux I am used to the df command to check filesystem usage. But in ESXi df only seems to be give info on the mounted filesystems whereas vdf gives the local filesystems (like RAMdisks and Tardisks (whatever that is)).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
~ # vdf -h Tardisk Space Used sb.v00 148M 148M s.v00 295M 295M misc_cni.v00 24K 21K net_bnx2.v00 304K 301K net_bnx2.v01 1M 1M net_cnic.v00 140K 137K ... imgdb.tgz 400K 400K state.tgz 28K 27K ----- Ramdisk Size Used Available Use% Mounted on root 32M 1M 30M 5% -- etc 28M 260K 27M 0% -- tmp 192M 532K 191M 0% -- hostdstats 1053M 8M 1044M 0% -- snmptraps 1M 0B 1M 0% -- |
Above output is after a reboot and all seems fine. To check the inode usage use the stat command.
|
~ # stat -f / File: "/" ID: 100000000 Namelen: 127 Type: visorfs Block size: 4096 Blocks: Total: 492406 Free: 331548 Available: 331548 Inodes: Total: 524288 Free: 519997 |
Or use exscli . It gives you the free space as well as the inode count!
|
~ # esxcli system visorfs ramdisk list Ramdisk Name System Include in Coredumps Reserved Maximum Used Peak Used Free Reserved Free Maximum Inodes Allocated Inodes Used Inodes Mount Point ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- root true true 32768 KiB 32768 KiB 1816 KiB 1820 KiB 94 % 94 % 8192 4096 3763 / etc true true 28672 KiB 28672 KiB 264 KiB 308 KiB 99 % 99 % 4096 1024 505 /etc tmp false false 2048 KiB 196608 KiB 532 KiB 868 KiB 99 % 74 % 8192 256 19 /tmp hostdstats false false 0 KiB 1078272 KiB 8552 KiB 8552 KiB 99 % 0 % 8192 32 5 /var/lib/vmware/hostd/stats snmptraps false false 0 KiB 1024 KiB 0 KiB 0 KiB 100 % 0 % 8192 32 1 /var/spool/snmp |
Note to self: Make a habit of using the esxcli command as that seems to be the VMware preferred way of doing things. Plus it’s one command with various namespaces you can use for networking and other info.
In my case things look to be fine now.
KB 2037798 talks about this problem. Apparently it is fixed via a patch released in 2013, and as far as I can tell we are properly patched so we shouldn’t have been hit by this issue. If it happens again though the same KB article talks about creating a separate RAMdisk for SFCB so even if it eats up all the inodes your root file system isn’t affected. This involves creating a new RAMdisk at boot time by modifying rc.local (nice!). The esxcli command can be used to create a new ramdisk and mount it at the mount point required by SFCB:
|
esxcli system visorfs ramdisk add --name sfcbtickets --min-size 0 --max-size 1024 --permissions 0755 --target /var/run/sfcb |
Turns out such an issue can also occur because of SNMP. Or if you have an HP Gen8 blade server then coz of the hpHelper.log file, which is fixed via a patch from HP (this server was a Gen8 blade but it didn’t have this log file). KB 2040707 too talks about this. Didn’t help much in my case as that didn’t seem to be my issue.
Two useful links for future reference are:
That’s all for now.
p.s. I keep talking about SFCB above but have no idea what it is. Turns out it is the CIM server for ESXi. Found this blog post on it.
Was creating / migrating some ESXi hosts during the week and came across the above error “Number of IPv4 routes did not match ” when checking for host profile compliance of one of the hosts. All network settings of this host appeared to be same as the rest so I was stumped as to what could be wrong. Via a VMware KB article I came across the esxcfg-route command that helped identify the problem. To run this command SSH into the host:
|
~ # esxcfg-route VMkernel default gateway is 10.50.0.252 |
By default the command only outputs the default gateway but you can pass it the -l switch to list all routes:
|
~ # esxcfg-route -l VMkernel Routes: Network Netmask Gateway Interface 10.50.0.0 255.255.255.0 Local Subnet vmk0 1.1.1.0 255.255.255.0 Local Subnet vmk2 default 0.0.0.0 10.50.0.252 vmk1 |
In my case the above output was from one of the hosts, while the following was from the non-compliant host:
|
~ # esxcfg-route -l VMkernel Routes: Network Netmask Gateway Interface 10.50.0.0 255.255.255.0 Local Subnet vmk0 10.50.0.0 255.255.0.0 Local Subnet vmk2 default 0.0.0.0 10.50.0.252 vmk1 |
Notice the vmk2 interface has the wrong network. Not sure how that happened. Oddly the GUI didn’t show this incorrect network but obviously something was corrupt somewhere.
To fix that I thought I’ll remove the vmk2 interface and re-add it. Big mistake! Possibly because its network was same as that of the management network (10.50.0.0/24 ) removing this interface caused the host to lose connectivity from vCenter. I could ping it but couldn’t connect to it via SSH, vSphere Client, or vCenter. Finally I had to reset the network via the DCUI – it’s under “Network Restore Options”. I tried “Restore vDS” first, didn’t help, so did a “Restore Standard Switch”. This is a very useful – it creates a new standard switch and moves the Management Network onto that so you get connectivity to the host. This way I was able to reconnect to the host, but now I stumbled upon a new problem.
The host didn’t have the vmk2 interface any more but when I tried to recreate it I got an error that the interface already exists. But no, it does not – the GUI has no trace of it! Some forum posts suggested restarting the vCenter service as that clears its cache and puts it in sync with the hosts but that didn’t help either. Then I came across this post which showed me that it is possible for the host to still have the VMkernel port but vCenter to not know of it. For this the esxcli command is your friend. To list all VMkernel ports on a host do the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
~ # esxcli network ip interface list vmk2 Name: vmk2 MAC Address: 00:50:52:24:2d:e3 Enabled: true Portset: DvsPortset-1 Portgroup: N/A Netstack Instance: defaultTcpipStack VDS Name: dvSwitch1 (ISCSI) VDS UUID: 92 c7 85 40 3e 7f 5a 5a-fd 4e e5 9d 2f 20 8d d2 VDS Port: 132 VDS Connection: 39130298 MTU: 1500 TSO MSS: 65535 Port ID: 100663300 ... |
After that, removing the VMkernel interface can be done by a variant of same command:
|
~ # esxcli network ip interface remove --interface-name vmk2 |
Now I could add the re-add the interface via vSphere and get the hosts into compliance.
Before I conclude this post though, a few notes on the commands above.
If you have PowerCLI installed you can run all the esxcli commands via the Get-EsxCli cmdlet. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
|
PowerCLI C:\> Get-EsxCli -VMHost ESX01.rakhesh.local =========================== EsxCli: esx01.rakhesh.local Elements: --------- device esxcli fcoe graphics hardware iscsi network sched software storage system vm vsan # notice it returns the esxcli names spaces # if I try invoking one of them I get the function definition instead PowerCLI C:\> (Get-EsxCli -VMHost ESX01.rakhesh.local).network.ip.interface.list TypeNameOfValue : VMware.VimAutomation.ViCore.Util10Ps.EsxCliExtensionMethod OverloadDefinitions : {vim.EsxCLI.network.ip.interface.list.NetworkInterface[] list(string netstack)} MemberType : CodeMethod Value : vim.EsxCLI.network.ip.interface.list.NetworkInterface[] list(string netstack) Name : list IsInstance : True # so I invoke it as a function with a () after the function name PowerCLI C:\> (Get-EsxCli -VMHost ESX01.rakhesh.local).network.ip.interface.list() Enabled : true MACAddress : 00:50:52:32:3f:23 MTU : 1500 Name : vmk2 NetstackInstance : defaultTcpipStack PortID : 100663300 Portgroup : N/A Portset : DvsPortset-1 TSOMSS : 65535 VDSConnection : 39130298 VDSName : dvSwitch1 (ISCSI) VDSPort : 132 VDSUUID : 92 c7 15 40 3f 7e 3a 5b-eb 1e d1 1d 2f 30 3d d2 ... |
If I wanted to remove the interface via PowerCLI the command would be slightly different:
|
PowerCLI C:\> (Get-EsxCli -VMHost ESX03.rakhesh.local).network.ip.interface.remove TypeNameOfValue : VMware.VimAutomation.ViCore.Util10Ps.EsxCliExtensionMethod OverloadDefinitions : {boolean remove(string dvportid, string dvsname, string interfacename, string netstack, string portgroupname)} MemberType : CodeMethod Value : boolean remove(string dvportid, string dvsname, string interfacename, string netstack, string portgroupname) Name : remove IsInstance : True # notice the function requires arguments. specifically, the position matters. I am targeting a standard vSwitch so must leave the first two arguments $null PowerCLI C:\> (Get-EsxCli -VMHost ESX01.rakhesh.local).network.ip.interface.remove($null,$null,"vmk2") |
I would have written more on the esxcli command itself but this excellent blog post covers it all. It’s an all powerful command that can be used to manage many aspects of the ESXi host, even set it in maintenance mode!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
|
# get current status ~ # esxcli system maintenanceMode get Disabled # set option help screen ~ # esxcli system maintenanceMode set Error: Missing required parameter -e|--enable Usage: esxcli system maintenanceMode set [cmd options] Description: set Enable or disable the maintenance mode of the system. Cmd options: -e|--enable enable maintenance mode (required) -t|--timeout=<long> Time to perform operation in seconds (default 60 seconds) # enter maintenance mode ~ # esxcli system maintenanceMode set --enable=true --timeout=5 # exit maintenance mode ~ # esxcli system maintenanceMode set --enable=false --timeout=5 |
Heck you can even use esxcli to upgrade from one ESXi version to another. It is also possible to run the esxcli command from a remote computer (Windows or Linux) by installing the vSphere CLI tools on that computer. Additionally, there’s also the vSphere Management Assistant (VMA) which is a virtual appliance that offers command line tools.
The esxcli is also useful if you want to kill a VM. For instance the following lists all running VMs on a host:
|
~ # esxcli vm process list UBUNTU World ID: 155802 Process ID: 0 VMX Cartel ID: 155801 UUID: 42 15 e9 7e b2 72 df 12-82 e2 b3 80 07 b5 fc d1 Display Name: UBUNTU Config File: /vmfs/volumes/557781ea-fae0924c-8041-000c29d09802/UBUNTU/UBUNTU.vmx |
If that VM were stuck for some reason and cannot be stopped or restarted via vSphere it’s very useful to know the esxcli command can be used to kill the VM (has happened a couple of times to me in the past):
|
~ # esxcli vm process kill -t force -w 155802 |
Regarding the type of killing you can do:
There are three types of VM kills that can be attempted: [soft, hard, force]. Users should always attempt ‘soft’ kills first, which will give the VMX process a chance to shutdown cleanly (like kill or kill -SIGTERM). If that does not work move to ‘hard’ kills which will shutdown the process immediately (like kill -9 or kill -SIGKILL). ‘force’ should be used as a last resort attempt to kill the VM. If all three fail then a reboot is required. (required)
Another command line option is vim-cmd which I stumbled upon from one of the links above. I haven’t used it much so as a reference to myself here’s a blog post explaining it in detail.
Lastly there’s also a bunch of esxcfg-* commands, one of whom we came across above.
|
~ # esxcfg- esxcfg-advcfg esxcfg-hwiscsi esxcfg-ipsec esxcfg-nas esxcfg-resgrp esxcfg-swiscsi esxcfg-vswitch esxcfg-dumppart esxcfg-info esxcfg-module esxcfg-nics esxcfg-route esxcfg-vmknic esxcfg-fcoe esxcfg-init esxcfg-mpath esxcfg-rescan esxcfg-scsidevs esxcfg-volume |
I haven’t used these much. They seem to be present for compatibility reasons with ESXi 3.x and prior. Back then you had commands with a vicfg- prefix, now you have the same but with a esxcfg- prefix. For instance, esxcfg-vmknic is now replaced with esxcli network interface as we saw above.
That’s all for now!
Update: Thought I’d use this post to keep track of other useful commands.
To get IPv4 addresses details:
|
~ # esxcli network ip interface ipv4 get Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address Type DHCP DNS ---- --------------- ------------- --------------- ------------ -------- vmk1 10.50.0.207 255.255.255.0 10.50.0.255 DHCP false vmk0 10.50.0.32 255.255.255.0 10.50.0.255 STATIC false vmk3 10.50.0.210 255.255.255.0 10.50.0.255 DHCP false vmk2 169.254.162.132 255.255.0.0 169.254.255.255 DHCP false |
Replace with ipv6 if that’s what you want.
To set an IPv4 address:
|
~ # esxcli network ip interface ipv4 set -i vmk2 -I 1.1.1.2 -N 255.255.255.0 -t static |
To ping an address from the host:
|
~ # vmkping 1.1.1.1 PING 1.1.1.1 (1.1.1.1): 56 data bytes 64 bytes from 1.1.1.1: icmp_seq=0 ttl=64 time=3.099 ms 64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.629 ms --- 1.1.1.1 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.629/1.864/3.099 ms |
Change keyboard layout:
Get current keyboard layout:
|
~ # esxcli system settings keyboard layout get United Kingdom |
List available layouts:
|
~ # esxcli system settings keyboard layout list Layout --------------- Belgian Brazilian Croatian ... |
Set a new layout:
|
~ # esxcli system settings keyboard layout set -l "US Default" |
Remotely enable SSH
The esxcli commands are cool but you need to enable SSH each time you want to connect to the host and run these (unless you install the CLI tools on your machine). If you have PowerCLI though you can enable SSH remotely.
To list the services:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
PowerCLI C:\> Get-VMHostService -VMHost "ESX03.rakhesh.local" Key Label Policy Running Required --- ----- ------ ------- -------- DCUI Direct Console UI on True False TSM ESXi Shell off False False TSM-SSH SSH off False False lbtd lbtd on True False lsassd Local Security Authenticati... off False False lwiod I/O Redirector (Active Dire... off False False netlogond Network Login Server (Activ... off False False ntpd NTP Daemon off False False sfcbd-watchdog CIM Server on True False snmpd snmpd on False False vprobed vprobed off False False vpxa vpxa on True False xorg xorg on False False |
To enable SSH and the ESXi shell:
|
PowerCLI C:\> $SSHsvc = Get-VMHostService -VMHost "ESX03.rakhesh.local" | ?{ $_.Label -eq "SSH" } PowerCLI C:\> $ESXIsvc = Get-VMHostService -VMHost "ESX03.rakhesh.local" | ?{ $_.Label -eq "ESXi Shell" } PowerCLI C:\> Start-VMHostService -HostService $SSHsvc Key Label Policy Running Required --- ----- ------ ------- -------- TSM-SSH SSH off True False PowerCLI C:\> Start-VMHostService -HostService $ESXIsvc Key Label Policy Running Required --- ----- ------ ------- -------- TSM ESXi Shell off True False |
|