Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

NSX Edge application rules to use a different pool

Coming from a NetScaler background I was used to the concept of a failover server. As in a virtual server would have a pool of servers it would load balance amongst and if all of them are down I can define a failover server that could be used. You would define the failover server as a virtual server with no IP, and tell the primary virtual server to failover to this virtual server in case of issues.

Looking around for a similar option with NSX I discovered it’s possible using application rules. Instead of defining two virtual servers though, here you define two pools. One pool for the primary servers you want to load balance, the other pool for the failover server(s).

Then you create an application rule this:

Once again, the syntax is that of HAProxy. You define an ACLadfs_pri_down is what I am defining for my purposes as this is for load balancing some ADFS servers – and the criterion is nbsrv(pool-adfs-https-443) eq 0. The nbsrv criterion checks the pool you pas on to it (pool-adfs-https-443 in my case) and returns the number of servers that are up. So the ACL basically is a boolean one that is true if the number of usable servers is 0.

Next, the use_backend rule switches to using the backup pool I have defined (pool-bkpadfs-https-443 in this case) if the ACL is true.

That’s all. Pretty straightforward!

NSX Edge application rules to limit HTTP

Not a big deal, but something I tried today mainly as an excuse to try this with NSX.

So here’s the situation. I have a pair of ADFS WAP servers that are load balanced using NSX. ADFS listens on port 443 so that’s the only port my VIP needs to handle.

However, we are also using Verisign DNS to failover between sites. So I want it such that if say both the ADFS servers in Site A are down, then Verisign DNS should failover my external ADFS records to Site B. Setting this up in Verisign DNS is easy but you need to be able to monitor the ADFS / WAP services externally from Verisign for this to work properly. Thus I had to setup my NSX WAP load balancer to listen on port 80 too and forward those to my internal ADFS servers. To monitor the health of ADFS Verisign will periodically query http://myadfsserver/adfs/probe. If that returns a 200 response all is good.

Now here’s the requirement I came up with for myself. I don’t want any and every HTTP query on port 80 to be handled. Yes, if I try http://myadfserver/somerandomurl it gives a 404 error but I don’t want that. I want any HTTP queries to any URL except /adfs/probe to be redirected to /adfs/probe. Figured this would be a good place to use application rules. Here’s what I came up with:

NSX application rules follow the same format as HAProxy so their config guide is a very handy reference.

The acl keyword defines an Access Control List. In my case the name of this ACL is allowed_url (this is a name I can define) and it matches URLs (hence the url keyword). Since I used url it does an exact match, but there are derivatives like url_beg and url_end and url_len and url_dir etc. that I could have used. I can even do regexp matches. In my case I am matching for the exact URL /adfs/probe. I define this as the ACL called allowed_url.

In the next line I use the redirect keyword to redirect any requests to the /adfs/probe location if it does not match the allowed_url ACL. That’s all!

This of course barely touches what one can do with application rules, but I am pleased to get started with this much for now. :)

ADFS monitoring on NSX

Was looking at setting up monitoring of my ADFS servers on NSX.

I know what to monitor on the ADFS and WAP servers thanks to this article.

http://<Web Application Proxy name>/adfs/probe
http://<ADFS server name>/adfs/probe
http://<Web Application Proxy IP address>/adfs/probe
http://<ADFS IP address>/adfs/probe

Need to get an HTTP 200 response for these.

So I created a service monitor in NSX along these lines:

And I associated it with my pool:

Bear in mind the monitor has to check port 80, even though my pool might be on port 443, so be sure to change the monitor port as above.

The “Show Pool Statistics” link on the “Pools” section quickly tells us whether the member servers are up or not:

The show service loadbalancer pool command can be used to see what the issue is in case the monitor appears down. Here’s an example when things aren’t working:

Here’s an example when all is well:

Thanks to this document for pointing me in the right troubleshooting direction. Quoting from that document, the list of error codes:

UNK: Unknown

INI: Initializing

SOCKERR: Socket error

L4OK: Check passed on layer 4, no upper layers testing enabled

L4TOUT: Layer 1-4 timeout

L4CON: Layer 1-4 connection problem. For example, “Connection refused” (tcp rst) or “No route to host” (icmp)

L6OK: Check passed on layer 6

L6TOUT: Layer 6 (SSL) timeout

L6RSP: Layer 6 invalid response – protocol error. May caused as the:

Backend server only supports “SSLv3” or “TLSv1.0”, or

Certificate of the backend server is invalid, or

The cipher negotiation failed, and so on

L7OK: Check passed on layer 7

L7OKC: Check conditionally passed on layer 7. For example, 404 with disable-on-404

L7TOUT: Layer 7 (HTTP/SMTP) timeout

L7RSP: Layer 7 invalid response – protocol error

L7STS: Layer 7 response error. For example, HTTP 5xx

Nice!

Quick note to self on NSX Load Balancing

Inline mode == Transparent mode (the latter is the terminology in the UI).

In this mode the load balancer is usually the default gateway for the servers it load balances. Traffic comes to the load balancer, it sends to the appropriate server (after changing the destination IP of the packet – hence DNAT), and replies come to it as it is the default gateway for the server. Note that as far as the destination server is concerned the source IP address is not the load balancer but the client who made the request. Thus the destination server knows who is making the request.

When the load balancer replies to the client who made the request it changes the source IP of the reply from the selected server to its own IP (hence SNAT when replying only).

One-Armed mode == Proxy mode

In this mode the load balancer is not the default gateway. The servers it load balance don’t have any changes required to be made to them. The load balancer does a DNAT as before, but also changes the source IP to be itself rather than the client (hence SNAT). When the selected server replies this time, it thinks the source is the load balancer and so replies to it rather than the client. Thus there’s no changes required on the server side. Because of this though, the server doesn’t know who made the request. All requests appear to come from the load balancer (unless you use some headers to capture the info).

As before, when the load balancer replies to the client who made the request it changes the source IP of the reply from the selected server to its own IP (hence SNAT when replying too).

You set the inline/ transparent vs. one-armed/ proxy mode per pool.

To have load balancing in NSX you need to deploy an ESG (Edge Services Gateway). I don’t know why, but I always associated an ESG with just external routing so it took me by surprise (and still does) when I think I need to deploy an ESG for load balancing, DHCP, and other edge- sort of services (VPN, routing, etc). I guess the point to remember is that it’s not just a gateway – it’s an edge services gateway. :)

Anyways, feel free to deploy as many ESGs as you feel like. You can have one huge ESG that takes care of all your load balancing needs, or you can have multiple small ones and hand over control to the responsible teams.

This is a good starting point doc from VMware.

You can have L4 and L7 load balancing. If you need only L4 (i.e. TCP, UDP, port number) the UI calls it acceleration. It’s a global configuration, on the ESG instance itself, so bear that in mind.

If you enable acceleration on an ESG, you have to also enable it per virtual server.

L4 load balancing is packet based (obviously, coz it doesn’t need to worry about the application as such). L7 load balancing is socket based. Quoting from this doc (highlight mine):

Packet-based load balancing is implemented on the TCP and UDP layer. Packet-based load balancing does not stop the connection or buffer the whole request, it sends the packet directly to the selected server after manipulating the packet. TCP and UDP sessions are maintained in the load balancer so that packets for a single session are directed to the same server. You can select Acceleration Enable in both the global configuration and relevant virtual server configuration to enable packet-based load balancing.

Socket-based load balancing is implemented on top of the socket interface. Two connections are established for a single request, a client-facing connection and a server-facing connection. The server-facing connection is established after server selection. For HTTP socket-based implementation, the whole request is received before sending to the selected server with optional L7 manipulation. For HTTPS socket-based implementation, authentication information is exchanged either on the client-facing connection or on the server-facing connection. Socket-based load balancing is the default mode for TCP, HTTP, and HTTPS virtual servers.

Also worth noting:

The L4 VIP (“acceleration enabled” in the VIP configuration and no L7 setting such as AppProfile with cookie persistence or SSL-Offload) is processed before the edge firewall, and no edge firewall rule is required to reach the VIP. However, if the VIP is using a pool in non-transparent mode, the edge firewall must be enabled (to allow the auto-created SNAT rule).

The L7 HTTP/HTTPS VIPs (“acceleration disabled” or L7 setting such as AppProfile with cookie persistence or SSL-Offload) are processed after the edge firewall, and require an edge firewall allow rule to reach the VIP.

Application Profiles define common application behaviors client SSL, server SSL, x-forwarded-for, and persistence. These can be reused across virtual server and is mandatory when defining a virtual server. This is also where you can do HTTP redirects.

[Aside] ESXi and NTP (more)

VMware VMs tend to correct the OS time by syncing with the host. You can disable this but it still syncs during vMotion and some other tasks. This was causing trouble with our Exchange environment as there was a few seconds difference between our hosts and one of the Exchange VMs had its time go back by 5seconds after a vMotion. So we decided to totally disable time sync with the host.

Instructions are simple – https://kb.vmware.com/s/article/1189.

I am supposed to add the following to the VM’s config file after shutting it down:

Sounds simple, but it didn’t go smooth. After doing this the VM refused to start and gave errors about an invalid config file. After some trial an error I figured that the first line was causing trouble. Remove that and now the VM starts as expected. Odd that no one else seems to have encountered this issue!

Curious about what these options do? This PDF is a trove of information on timekeeping and VMware. From the PDF I found the following:

What else? This KB helped me with reloading the VM config file after I made changes. Do vim-cmd vmsvc/getallvms to get a list of VMs and note the ID of the one we are interested in. Then do vim-cmd vmsvc/reload <vmid> to reload.

Update: I realized why things broke. My VM was already set to not update time via VMware Tools, so it already had a line like tools.syncTime = "FALSE". That’s why the first line was causing a conflict.

[Aside] ESXi and NTP

Two posts as a reference to myself –

partedUtil and installing ESXi on a USB disk and using it as a datastore

Recently I wanted to install ESXi 6.5 on a USB disk and also use that disk as a datastore to store VM on. I couldn’t get any VMs to run off the USB disk but I spent some time getting the USB disk presented as a datastore so wanted to post that here.

Installing ESXi 6.5 to a USB is straight-forward.

And this blog post is a good reference on what to do so that a USB disk is visible as a datastore. This blog post is about presenting a USB disk without ESXi installed on it – i.e. you use the USB disk entirely as a datastore. In my case the disk already had partitions on it so I had to make some changes to the instructions in that blog post. This meant a bit of mucking about with partedUtil, which is the ESXi command line way of fiddling with partition tables. (fdisk while present is no longer supported as it doesn’t do GPT).

1. First, connect to the ESXi host via SSH.

2. Shutdown the USB arbitrator service (this is used to present a USB disk to a VM): /etc/init.d/usbarbitrator stop

3. Permanently disable this service too: chkconfig usbarbitrator off

4. Now find the USB disk device from /dev/disks. This can be done via an ls -al. In my case the device was called /dev/disks/t10.SanDisk00Cruzer_Switch0000004C531001441121115514.

So far so good?

To find the partitions on this device use the partedUtil getptbl command. Example output from my case:

The “gpt” indicates this is a GPT partition table. The four numbers after that give the number of cylinders (7625), heads (255), sectors per track (63), as well as the total number of sectors (122508544). Multiplying the cylinders x heads x sectors per head should give a similar figure too (122495625).

An entry such as 9 1843200 7086079 9D27538040AD11DBBF97000C2911D1B8 vmkDiagnostic 0 means the following:

  • partition number 9
  • starting at sector 1843200
  • ending at sector 7086079
  • of GUID 7086079 9D27538040AD11DBBF97000C2911D1B8, type vmkDiagnostic (you can get a list of all known GUIDs and type via the partedUtil showGuids command)
  • attribute 0

In my case since the total number of sectors is 122495625 (am taking the product of the CHS figures) and the last partition ends at sector 7086079 I have free space where I can create a new partition. This is what I’d like to expose to the ESX host.

There seems to be gap of 33 sectors between partitions (at least between 8 and 7, and 7 and 6 – I didn’t check them all :)). So my new partition should start at sector 7086112 (7086079 + 33) and end at 122495624 (122495625 -1) (we leave one sector in the end). The VMFS partition GUID is AA31E02A400F11DB9590000C2911D1B8, thus my entry would look something like this: 10 7086112 122495624 AA31E02A400F11DB9590000C2911D1B8 0.

But we can’t do that at the moment as the disk is read-only. If I try making any changes to the disk it will throw an error like this:

From a VMware forum post I learnt that this is because the disk has a coredump partition (the vmkDiagnostic partitions we saw above). We need to disable that first.

5. Disable the coredump partition: esxcli system coredump partition set --enable false

6. Delete the coredump partitions:

7. Output the partition table again:

So what I want to add above is partition 9. An entry such as 9 1843232 122495624 AA31E02A400F11DB9590000C2911D1B8 0.

8. Set the partition table. Take note to include the existing partitions as well as the command replaces everything.

That’s it. Now partition 9 will be created.

All the partitions also have direct entries under /dev/disks. Here’s the entries in my case after the above changes:

Not sure what the “vml” entries are.

9. Next step is to create the datastore.

That’s it! Now ESXi will see a datastore called “USB-Datastore” formatted with VMFS6. :)

FC with Synergy 3820C 10/20Gb CNA and VMware ESXi

(This post is intentionally brief because I don’t want to sidetrack by talking more on the things I link to. I am trying to clear my browser tabs by making blog posts on what’s open, so I want to focus on just getting stuff posted. :)

At work we are moving HPE Synergy now. We have two Synergy 12000 frames with each frame containing a Virtual Connect SE 40Gb F8 Module for Synergy. The two frames are linked via Synergy 20Gb Interconnect Link Module(s). (Synergy has a master/ satellite module for the Virtual Connect modules so you don’t need a Virtual Connect module per frame (or enclosure as it used to be in the past)). The frames have SY 480 Gen 10 compute modules, running ESXi 6.5, and the mezzanine slot of each compute module has a Synergy 3820C 10/20Gb CNA module. The OS in the compute modules should see up to 4 FlexNIC or FlexHBA adapters per Virtual Connect module.

The FlexHBA adapters are actually FCoE adapters (they provide FCoE and/ or iSCSI actually). By default these FlexHBA adapters are not listed as storage adapters in ESXi so one has to follow the instructions in this link. Basically:

1) Determine the vmnic IDs of the FCoE adapters:

2) Then do a discovery to activate FCoE:

As a reference to my future self, here’s a blog post on how to do this automatically for stateless installs.

Totally unrelated to the above, but something I had found while Googling on this issue: Implementing Multi-Chassis Link Aggregation Groups (MC-LAG) with HPE Synergy Virtual Connect SE 40Gb F8 Module and Arista 7050 Series Switches. A good read.

Also, two good blog posts on Synergy:

[Aside] ESXCLI storage commands

Had to spend some time recently identifying the attached storage devices and adapters to an ESXi box and the above links were handy. Thought I should put them in here as a reference to myself.

VCSA 6.5 – Could not connect to one or more vCenter Server systems?

Had to shutdown VCSA 6.5 in our environment recently (along with every other VM in there actually) and upon restarting it later I couldn’t connect to it. The Web UI came up but was stuck on a message that it was waiting for all services to start (I didn’t take a screenshot so can’t give the exact message here). I was unable to start all the services via the service-control command either.

The /var/log/vmware/vpxd/vpxd.log file pointed me in the right direction. Turns out the issue was name resolution. Even though my DNS providing VM was powered on, it didn’t have network connectivity (since vCenter was down and the DNS VM connects to a vDS? not sure). Workaround was to move it to a standard switch and then I was able to start all the VCSA services.

Later on I came across this KB article. Should have just added an entry for VCSA into its /etc/hosts file.

Script to run esxcli unmap on all datastores attached to an ESXi host

It’s a good idea to periodically run the UNMAP command on all your thin-provisioned LUNs. This allows the storage system to reclaim deleted blocks. (What is SCSI UNMAP?)

The format of the command is:

I wanted to make a script to run this on all attached datastores so here’s what I came up with:

The esxcli storage filesystem list command outputs a list of datastores attached to the system. The second column is what I am interested in, so that’s what awk takes care for me. I don’t want to target any local datastores, so I use grep to filter out  the ones I am interested in. 

Next step would be to add this to a cron job. Got to follow the instructions here, it looks like. 

Migrating VMkernel port from Standard to Distributed Switch fails

I am putting a link to the official VMware documentation on this as I Googled it just to confirm to myself I am not doing anything wrong! What I need to do is migrate the physical NICs and Management/ VM Network VMkernel NIC from a standard switch to a distributed switch. Process is simple and straight-forward, and one that I have done numerous times; yet it fails for me now!

Here’s a copy paste from the documentation:

  1. Navigate to Home > Inventory > Networking.
  2. Right-click the dVswitch.
  3. If the host is already added to the dVswitch, click Manage Hosts, else Click Add Host.
  4. Select the host(s), click Next.
  5. Select the physical adapters ( vmnic) to use for the vmkernel, click Next.
  6. Select the Virtual adapter ( vmk) to migrate and click Destination port group field. For each adapter, select the correct port group from dropdown, Click Next.
  7. Click Next to omit virtual machine networking migration.
  8. Click Finish after reviewing the new vmkernel and Uplink assignment.
  9. The wizard and the job completes moving both the vmk interface and the vmnic to the dVswitch.

Basically add physical NICs to the distributed switch & migrate vmk NICs as part of the process. For good measure I usually migrate only one physical NIC from the standard switch to the distributed switch, and then separately migrate the vmk NICs. 

Here’s what happens when I am doing the above now. (Note: now. I never had an issue with this earlier. Am guessing it must be some bug in a newer 5.5 update, or something’s wrong in the underlying network at my firm. I don’t think it’s the networking coz I got my network admins to take a look, and I tested that all NICs on the host have connectivity to the outside world (did this by making each NIC the active one and disabling the others)). 

First it’s stuck in progress:

And then vCenter cannot see the host any more:

Oddly I can still ping the host on the vmk NIC IP address. However I can’t SSH into it, so the Management bits are what seem to be down. The host has connectivity to the outside world because it passes the Management network tests from DCUI (which I can connect to via iLO). I restarted the Management agents too, but nope – cannot SSH or get vCenter to see the host. Something in the migration step breaks things. Only solution is to reboot and then vCenter can see the host.

Here’s what I did to workaround anyways. 

First I moved one physical NIC to the distributed switch.

Then I created a new management portgroup and VMkernel NIC on that for management traffic. Assigned it a temporary IP.

Next I opened a console to the host. Here’s the current config on the host:

The interface vmk0 (or its IPv4 address rather) is what I wanted to migrate. The interface vmk4 is what I created temporarily. 

I now removed the IPv4 address of the existing vmk NIC and assigned that to the new one. Also, confirmed the changes just to be sure. As soon as I did so vCenter picked up the changes. I then tried to move the remaining physical NIC over to the distributed switch, but that failed. Gave an error that the existing connection was forcibly closed by the host. So I rebooted the host. Post-reboot I found that the host now thought it had no IP, even though it was responding to the old IP via the new vmk. So this approach was a no-go (but still leaving it here as a reminder to myself that this does not work)

I now migrated vmk0 from the standard switch to the distributed switch. As before, this will fail – vCenter will lose connectivity to the ESX host. But that’s why I have a console open. As expected the output of esxcli network ip interface list shows me that vmk0 hasn’t moved to the distributed switch:

So now I go ahead and remove the IPv4 address of vmk0 and assign that to vmk4 (the new one). Also confirmed the changes. 

Next I rebooted (reboot) the host, and via the CLI I removed vmk0 (for some reason the GUI showed both vmk0 and vmk4 with the same IP I assigned above). 

Reboot again!

Post-reboot I can go back to the GUI and move the remaining physical NIC over to the distributed switch. :) Yay!

[Aside] How to quickly get ESXi logs from a web browser (without SSH, vSphere client, etc)

This post made my work easy yesterday – https://www.vladan.fr/check-esxi-logs-from-web-browser/

tl;dr version:  go to https://IP_of_Your_ESXi/host

Notes on MCS disks

Primer 1. Primer 2. MCS Prep overview (good post, I don’t refer to all its points below). 

  • MCS creates a snapshot of the master VM you specify, but if you specify a snapshot it will not create another one. 
  • This snapshot is used to create to create a full clone. A full snapshot, so to say. 
    • This way the image used by the catalog is independent of the master VM. 
    • During the preparation of this full snapshot an “instruction disk” is attached to the VM that is temporarily created using the full snapshot. This disk enables DHCP on all interfaces of the full snapshot; does some KMS related tasks; and runs vDisk inventory collection if required.
  • This full snapshot is stored on each storage repository that is used by Desktop Studio. 
    • This full snapshot is shared by all VMs on that storage repository. 
  • Each storage repository will also have an identity disk (16 MB) per VM.
  • Each storage repository will also have a delta/ difference disk per VM.
    • This is thin provisioned if the storage supports it.
    • Can increase up to the maximum size of the VM.

Remember my previous post on the types:

  • Random.
    • Delta disk is deleted during reboot. 
  • Static + Save changes.
    • Changes are saved to a vDisk. 
    • Delta disk not used?
  • Static + Dedicated VM.
    • Delta disk is not deleted during reboot. 
    • Important to keep in mind: if the master image in the catalog is updated, existing VMs do not automatically start using it upon next reboot. Only newly created dedicated VMs use the new image. 
    • The delta disk is deleted when the master image is updated and existing VMs are made to use the new image (basically, new VMs are created and the delta disk starts from scratch; user customizations are lost). 
    • Better to use desktop management tools (of the OS) to keep dedicated VMs up to date coz of the above issue. 
  • Static + Discard changes.
    • Delta disk is deleted during reboot. 

A post on sealing the vDisk after changes. Didn’t realize there’s so many steps to be done. 

[Aside] Memory Resource Management in ESXi

Came across this PDF from VMware while reading on memory management. It’s dated, but a good read. Below are some notes I took while reading it. Wanted to link to the PDF and also put these somewhere; hence this post.

Some terminology:

  • Host physical memory <–[mapped to]– Guest physical memory (continuous virtual address space presented by Hypervisor to Guest OS) <–[mapped to]– Guest virtual memory (continuous virtual address space presented by Guest OS to its applications).
    • Guest virtual -> Guest physical mapping is in Guest OS page tables
    • Guest physical -> Host physical mapping is in pmap data structure
      • There’s also a shadow page table that the Hypervisor maintains for Guest virtual -> Guest physical
      • A VM does Guest virtual -> Guest physical mapping via hardware Translation Lookup Buffers (TLBs). The hypervisor intercepts calls to these; and uses these to keep its shadow page tables up to date.
  • Guest physical memory -> Guest swap device (disk) == Guest level paging.
  • Guest physical memory -> Host swap device (disk) == Hypervisor swapping.

Some interesting bits on the process:

  • Applications use OS provided interfaces to allocate & de-allocate memory.
  • OSes have different implementations on how memory is classified as free or allocated. For example: two lists.
  • A VM has no pre-allocated physical memory.
  • Hypervisor maintains its own data structures for free and allocated memory for a VM.
  • Allocating memory for a VM is easy. When the VM Guest OS makes a request to a certain location, it will generate a page fault. The hypervisor can capture that and allocate memory.
  • De-allocation is tricky because there’s no way for the hypervisor to know the memory is not in use. These lists are internal to the OS. So there’s no straight-forward way to take back memory from a VM.
  • The host physical memory assigned to a VM doesn’t keep growing indefinitely though as the guest OS will free and allocate within the range assigned to it, so it will stick within what it has. And side by side the hypervisor tries to take back memory anyways.
    • Only when the VM tries to access memory that is not actually mapped to host physical memory does a page fault happen. The hypervisor will intercept that and allocate memory.
  • For de-allocation, the hypervisor adds the VM assigned memory to a free list. Actual data in the physical memory may not be modified. Only when that physical memory is subsequently allocated to some other VM does it get zeroed out.
  • Ballooning is one way of reclaiming memory from the VM. This is a driver loaded in the Guest OS.
    • Hypervisor tells ballooning driver how much memory it needs back.
    • Driver will pin those memory pages using Guest OS APIs (so the Guest OS thinks those pages are in use and should not assign to anyone else).
    • Driver will inform Hypervisor it has done this. And Hypervisor will remove the physical backing of those pages from physical memory and assign it to other VMs.
    • Basically the balloon driver inflates the VM’s memory usage, giving it the impression a lot of memory is in use. Hence the term “balloon”.
  • Another way is Hypervisor swapping. In this the Hypervisor swaps to physical disk some of the physical memory it has assigned to the VM. So what the VM thinks is physical memory is actually on disk. This is basically swapping – just that it’s done by Hypervisor, instead of Guest OS.
    • This is not at all preferred coz it’s obviously going to affect VM performance.
    • Moreover, the Guest OS too could swap the same memory pages to its disk if it is under memory pressure. Hence double paging.
  • Ballooning is slow. Hypervisor swapping is fast. Ballooning is preferred though; Hypervisor swapping is only used when under lots of pressure.
  • Host (Hypervisor) has 4 memory states (view this via esxtop, press m).
    • High == All Good
    • Soft == Start ballooning. (Starts before the soft state is actually reached).
    • Hard == Hypervisor swapping too.
    • Low == Hypervisor swapping + block VMs that use more memory than their target allocations.