These are less of notes and more of links and what I did when I encountered this issue. Just for my future self.
At work we had a host which was giving HA errors. The message was along the lines that vCenter could not contact HA. So I tried reconfiguring it for HA (right click the host and select “Reconfigure for vSphere HA”) upon which I got a new error: Cannot install the vCenter Server agent service. Cannot upload agent
.
Initially I thought it must just be a permissions issue. But it wasn’t so.
To investigate further I tried logging on to the server. I couldn’t enable SSH and ESXi Shell from the Configuration tab – it gave me an error. So I iLO’d into the server DCUI and enabled SSH and ESXi Shell. SSH still refused to let me in, and when I’d press Alt+F1
on the console to get the login prompt it was filled with messages like these: /bin/sh cant fork
. Initially I thought it might be to do with HP AMS memory leak (see this and this) but it wasn’t.
I pressed Alt+F12
to see the on-screen logs. It was filled with messages like these:
There was nothing more I could do here basically. Couldn’t login to the server at all, heck I couldn’t even Shutdown/ Restart it gracefully via F12
in DCUI (nothing would happen). So I cold booted it and that got it working.
It’s been about 2 hours since I did that and the server seems stable so maybe it was a one off-thing. I looked at more logs though and here’s what I found.
/var/log/syslog.log
(Contains: Management service initialization, watchdogs, scheduled tasks and DCUI use)
1 2 3 4 5 6 7 8 9 10 |
2015-08-14T08:37:39Z sfcb-CIMXML-Processor[22385291]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:40Z sfcb-CIMXML-Processor[22385292]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:40Z sfcb-CIMXML-Processor[22385293]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:40Z sfcb-CIMXML-Processor[22385294]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385295]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385296]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385297]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385298]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:41Z sfcb-CIMXML-Processor[22385299]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' 2015-08-14T08:37:42Z sfcb-CIMXML-Processor[22352532]: TicketCache --- Can't open '/var/run/sfcb/52cbb0d0-da3a-9ad5-322d-361a1caafbcc', Error: 'No space left on device' |
/var/log/vmkwarning.log
(Contains: A summary of Warning and Alert log messages excerpted from the VMkernel logs)
1 2 3 4 5 6 7 |
2015-08-13T19:56:19.608Z cpu2:22382164)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/527fb83b-7c0b-4fe2-0152-d81fb0bac853 for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-13T20:00:14.737Z cpu4:34191 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-52f258cf-a87b-e1 for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-13T20:04:46.110Z cpu30:34194 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-52c87856-17ee-61 for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-13T20:09:17.481Z cpu3:36506 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-529ddabc-6196-dd for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-13T20:13:48.849Z cpu11:7960868 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-5278454b-65e6-1d for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-13T20:15:53.301Z cpu6:21329945)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-7f09012d-0b29-44 for process cimslp because the inode table of its ramdisk (root) is full. 2015-08-13T20:16:48.853Z cpu12:35008 opID=ee934b0f)WARNING: VisorFSObj: 1940: Cannot create file /var/run/vmware/tickets/vmtck-5257e7ba-7c96-d0 for process hostd-worker because the inode table of its ramdisk (root) is full. |
/var/log/vob.log
(Contains: VMkernel Observation events)
1 2 3 4 5 6 7 |
2015-08-17T00:15:19.220Z: [VisorfsCorrelator] 17133398447519us: [vob.visorfs.ramdisk.inodetable.full] Cannot create file /var/run/vmware/tickets/vmtck-52b5db61-d61e-8d for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-17T00:15:19.220Z: [VisorfsCorrelator] 17133319127883us: [esx.problem.visorfs.ramdisk.inodetable.full] The file table of the ramdisk 'root' is full. As a result, the file /var/run/vmware/tickets/vmtck-52b5db61-d61e-8d could not be created by the application 'hostd-worker'. 2015-08-17T00:21:20.587Z: [VisorfsCorrelator] 17133759815799us: [vob.visorfs.ramdisk.inodetable.full] Cannot create file /var/run/vmware/tickets/vmtck-52ac40ae-4240-e3 for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-17T00:21:20.587Z: [VisorfsCorrelator] 17133680494786us: [esx.problem.visorfs.ramdisk.inodetable.full] The file table of the ramdisk 'root' is full. As a result, the file /var/run/vmware/tickets/vmtck-52ac40ae-4240-e3 could not be created by the application 'hostd-worker'. 2015-08-17T00:25:51.966Z: [VisorfsCorrelator] 17134031195582us: [vob.visorfs.ramdisk.inodetable.full] Cannot create file /var/run/vmware/tickets/vmtck-520e1b5c-35f2-21 for process hostd-worker because the inode table of its ramdisk (root) is full. 2015-08-17T00:25:51.966Z: [VisorfsCorrelator] 17133951873623us: [esx.problem.visorfs.ramdisk.inodetable.full] The file table of the ramdisk 'root' is full. As a result, the file /var/run/vmware/tickets/vmtck-520e1b5c-35f2-21 could not be created by the application 'hostd-worker'. 2015-08-17T00:30:23.342Z: [VisorfsCorrelator] 17134302572394us: [vob.visorfs.ramdisk.inodetable.full] Cannot create file /var/run/vmware/tickets/vmtck-52b93f74-9429-59 for process hostd-worker because the inode table of its ramdisk (root) is full. |
/var/log/vmkernel.log
(Contains: Core VMkernel logs, including device discovery, storage and networking device and driver events, and virtual machine startup)
1 2 3 4 5 6 7 8 9 10 |
2015-08-17T01:22:09.441Z cpu30:22401956)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:09.740Z cpu27:22401957)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:09.915Z cpu11:7960868)World: 14299: VC opID hostd-b90c maps to vmkernel opID 576afc9e 2015-08-17T01:22:10.060Z cpu13:22401959)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:10.367Z cpu9:22401960)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:10.629Z cpu26:22401962)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:10.869Z cpu22:22401963)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:11.113Z cpu25:22401966)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:11.359Z cpu17:22401968)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. 2015-08-17T01:22:11.668Z cpu12:22401970)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/5277bce4-a843-d718-aacc-a7bf06d5768a for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full. |
/var/log/hostd.log
(Contains: Host management service logs, including virtual machine and host Task and Events, communication with the vSphere Client and vCenter Server vpxa
agent, and SDK connections.)
1 2 3 4 5 6 |
2015-08-15T09:11:54.083Z [2B781B70 info 'Vimsvc.ha-eventmgr'] Event 9898 : The file table of the ramdisk 'root' is full. As a result, the file /var/run/vmware/tickets/vmtck-52533095-569d-5c could not be created by the application 'hostd-worker'. 2015-08-15T09:11:54.087Z [FF95EB70 info 'Hostsvc' opID=hostd-6801] VsanSystemVmkProvider : GetRuntimeInfo: Start 2015-08-15T09:11:54.087Z [FF95EB70 info 'Hostsvc' opID=hostd-6801] VsanSystemVmkProvider : GetRuntimeInfo: Complete, runtime info: (vim.vsan.host.VsanRuntimeInfo) { --> dynamicType = <unset>, --> accessGenNo = 0, --> |
From these logs one thing was clear. The ESXi RAMdisk hosting the root filesystem had run out of inodes. Possibly caused by the SFCB service. Because of this the root filesystem had run out of space and everything was failing. Great!
In Linux I am used to the df command to check filesystem usage. But in ESXi df only seems to be give info on the mounted filesystems whereas vdf gives the local filesystems (like RAMdisks and Tardisks (whatever that is)).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
~ # vdf -h Tardisk Space Used sb.v00 148M 148M s.v00 295M 295M misc_cni.v00 24K 21K net_bnx2.v00 304K 301K net_bnx2.v01 1M 1M net_cnic.v00 140K 137K ... imgdb.tgz 400K 400K state.tgz 28K 27K ----- Ramdisk Size Used Available Use% Mounted on root 32M 1M 30M 5% -- etc 28M 260K 27M 0% -- tmp 192M 532K 191M 0% -- hostdstats 1053M 8M 1044M 0% -- snmptraps 1M 0B 1M 0% -- |
Above output is after a reboot and all seems fine. To check the inode usage use the stat
command.
1 2 3 4 5 6 |
~ # stat -f / File: "/" ID: 100000000 Namelen: 127 Type: visorfs Block size: 4096 Blocks: Total: 492406 Free: 331548 Available: 331548 Inodes: Total: 524288 Free: 519997 |
Or use exscli
. It gives you the free space as well as the inode count!
1 2 3 4 5 6 7 8 |
~ # esxcli system visorfs ramdisk list Ramdisk Name System Include in Coredumps Reserved Maximum Used Peak Used Free Reserved Free Maximum Inodes Allocated Inodes Used Inodes Mount Point ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- root true true 32768 KiB 32768 KiB 1816 KiB 1820 KiB 94 % 94 % 8192 4096 3763 / etc true true 28672 KiB 28672 KiB 264 KiB 308 KiB 99 % 99 % 4096 1024 505 /etc tmp false false 2048 KiB 196608 KiB 532 KiB 868 KiB 99 % 74 % 8192 256 19 /tmp hostdstats false false 0 KiB 1078272 KiB 8552 KiB 8552 KiB 99 % 0 % 8192 32 5 /var/lib/vmware/hostd/stats snmptraps false false 0 KiB 1024 KiB 0 KiB 0 KiB 100 % 0 % 8192 32 1 /var/spool/snmp |
Note to self: Make a habit of using the esxcli command as that seems to be the VMware preferred way of doing things. Plus it’s one command with various namespaces you can use for networking and other info.
In my case things look to be fine now.
KB 2037798 talks about this problem. Apparently it is fixed via a patch released in 2013, and as far as I can tell we are properly patched so we shouldn’t have been hit by this issue. If it happens again though the same KB article talks about creating a separate RAMdisk for SFCB so even if it eats up all the inodes your root file system isn’t affected. This involves creating a new RAMdisk at boot time by modifying rc.local (nice!). The esxcli
command can be used to create a new ramdisk and mount it at the mount point required by SFCB:
1 |
esxcli system visorfs ramdisk add --name sfcbtickets --min-size 0 --max-size 1024 --permissions 0755 --target /var/run/sfcb |
Turns out such an issue can also occur because of SNMP. Or if you have an HP Gen8 blade server then coz of the hpHelper.log
file, which is fixed via a patch from HP (this server was a Gen8 blade but it didn’t have this log file). KB 2040707 too talks about this. Didn’t help much in my case as that didn’t seem to be my issue.
Two useful links for future reference are:
- KB 1003564: Investigating disk space on an ESX or ESXi host – very informative!
- KB 2001550: RAM disk is full – nothing much but worth a read.
That’s all for now.
p.s. I keep talking about SFCB above but have no idea what it is. Turns out it is the CIM server for ESXi. Found this blog post on it.