Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Recent Posts

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

partedUtil and installing ESXi on a USB disk and using it as a datastore

Recently I wanted to install ESXi 6.5 on a USB disk and also use that disk as a datastore to store VM on. I couldn’t get any VMs to run off the USB disk but I spent some time getting the USB disk presented as a datastore so wanted to post that here.

Installing ESXi 6.5 to a USB is straight-forward.

And this blog post is a good reference on what to do so that a USB disk is visible as a datastore. This blog post is about presenting a USB disk without ESXi installed on it – i.e. you use the USB disk entirely as a datastore. In my case the disk already had partitions on it so I had to make some changes to the instructions in that blog post. This meant a bit of mucking about with partedUtil, which is the ESXi command line way of fiddling with partition tables. (fdisk while present is no longer supported as it doesn’t do GPT).

1. First, connect to the ESXi host via SSH.

2. Shutdown the USB arbitrator service (this is used to present a USB disk to a VM): /etc/init.d/usbarbitrator stop

3. Permanently disable this service too: chkconfig usbarbitrator off

4. Now find the USB disk device from /dev/disks. This can be done via an ls -al. In my case the device was called /dev/disks/t10.SanDisk00Cruzer_Switch0000004C531001441121115514.

So far so good?

To find the partitions on this device use the partedUtil getptbl command. Example output from my case:

The “gpt” indicates this is a GPT partition table. The four numbers after that give the number of cylinders (7625), heads (255), sectors per track (63), as well as the total number of sectors (122508544). Multiplying the cylinders x heads x sectors per head should give a similar figure too (122495625).

An entry such as 9 1843200 7086079 9D27538040AD11DBBF97000C2911D1B8 vmkDiagnostic 0 means the following:

  • partition number 9
  • starting at sector 1843200
  • ending at sector 7086079
  • of GUID 7086079 9D27538040AD11DBBF97000C2911D1B8, type vmkDiagnostic (you can get a list of all known GUIDs and type via the partedUtil showGuids command)
  • attribute 0

In my case since the total number of sectors is 122495625 (am taking the product of the CHS figures) and the last partition ends at sector 7086079 I have free space where I can create a new partition. This is what I’d like to expose to the ESX host.

There seems to be gap of 33 sectors between partitions (at least between 8 and 7, and 7 and 6 – I didn’t check them all :)). So my new partition should start at sector 7086112 (7086079 + 33) and end at 122495624 (122495625 -1) (we leave one sector in the end). The VMFS partition GUID is AA31E02A400F11DB9590000C2911D1B8, thus my entry would look something like this: 10 7086112 122495624 AA31E02A400F11DB9590000C2911D1B8 0.

But we can’t do that at the moment as the disk is read-only. If I try making any changes to the disk it will throw an error like this:

From a VMware forum post I learnt that this is because the disk has a coredump partition (the vmkDiagnostic partitions we saw above). We need to disable that first.

5. Disable the coredump partition: esxcli system coredump partition set --enable false

6. Delete the coredump partitions:

7. Output the partition table again:

So what I want to add above is partition 9. An entry such as 9 1843232 122495624 AA31E02A400F11DB9590000C2911D1B8 0.

8. Set the partition table. Take note to include the existing partitions as well as the command replaces everything.

That’s it. Now partition 9 will be created.

All the partitions also have direct entries under /dev/disks. Here’s the entries in my case after the above changes:

Not sure what the “vml” entries are.

9. Next step is to create the datastore.

That’s it! Now ESXi will see a datastore called “USB-Datastore” formatted with VMFS6. :)

FC with Synergy 3820C 10/20Gb CNA and VMware ESXi

(This post is intentionally brief because I don’t want to sidetrack by talking more on the things I link to. I am trying to clear my browser tabs by making blog posts on what’s open, so I want to focus on just getting stuff posted. :)

At work we are moving HPE Synergy now. We have two Synergy 12000 frames with each frame containing a Virtual Connect SE 40Gb F8 Module for Synergy. The two frames are linked via Synergy 20Gb Interconnect Link Module(s). (Synergy has a master/ satellite module for the Virtual Connect modules so you don’t need a Virtual Connect module per frame (or enclosure as it used to be in the past)). The frames have SY 480 Gen 10 compute modules, running ESXi 6.5, and the mezzanine slot of each compute module has a Synergy 3820C 10/20Gb CNA module. The OS in the compute modules should see up to 4 FlexNIC or FlexHBA adapters per Virtual Connect module.

The FlexHBA adapters are actually FCoE adapters (they provide FCoE and/ or iSCSI actually). By default these FlexHBA adapters are not listed as storage adapters in ESXi so one has to follow the instructions in this link. Basically:

1) Determine the vmnic IDs of the FCoE adapters:

2) Then do a discovery to activate FCoE:

As a reference to my future self, here’s a blog post on how to do this automatically for stateless installs.

Totally unrelated to the above, but something I had found while Googling on this issue: Implementing Multi-Chassis Link Aggregation Groups (MC-LAG) with HPE Synergy Virtual Connect SE 40Gb F8 Module and Arista 7050 Series Switches. A good read.

Also, two good blog posts on Synergy:

[Aside] ESXCLI storage commands

Had to spend some time recently identifying the attached storage devices and adapters to an ESXi box and the above links were handy. Thought I should put them in here as a reference to myself.

VCSA 6.5 – Could not connect to one or more vCenter Server systems?

Had to shutdown VCSA 6.5 in our environment recently (along with every other VM in there actually) and upon restarting it later I couldn’t connect to it. The Web UI came up but was stuck on a message that it was waiting for all services to start (I didn’t take a screenshot so can’t give the exact message here). I was unable to start all the services via the service-control command either.

The /var/log/vmware/vpxd/vpxd.log file pointed me in the right direction. Turns out the issue was name resolution. Even though my DNS providing VM was powered on, it didn’t have network connectivity (since vCenter was down and the DNS VM connects to a vDS? not sure). Workaround was to move it to a standard switch and then I was able to start all the VCSA services.

Later on I came across this KB article. Should have just added an entry for VCSA into its /etc/hosts file.

Script to run esxcli unmap on all datastores attached to an ESXi host

It’s a good idea to periodically run the UNMAP command on all your thin-provisioned LUNs. This allows the storage system to reclaim deleted blocks. (What is SCSI UNMAP?)

The format of the command is:

I wanted to make a script to run this on all attached datastores so here’s what I came up with:

The esxcli storage filesystem list command outputs a list of datastores attached to the system. The second column is what I am interested in, so that’s what awk takes care for me. I don’t want to target any local datastores, so I use grep to filter out  the ones I am interested in. 

Next step would be to add this to a cron job. Got to follow the instructions here, it looks like. 

Migrating VMkernel port from Standard to Distributed Switch fails

I am putting a link to the official VMware documentation on this as I Googled it just to confirm to myself I am not doing anything wrong! What I need to do is migrate the physical NICs and Management/ VM Network VMkernel NIC from a standard switch to a distributed switch. Process is simple and straight-forward, and one that I have done numerous times; yet it fails for me now!

Here’s a copy paste from the documentation:

  1. Navigate to Home > Inventory > Networking.
  2. Right-click the dVswitch.
  3. If the host is already added to the dVswitch, click Manage Hosts, else Click Add Host.
  4. Select the host(s), click Next.
  5. Select the physical adapters ( vmnic) to use for the vmkernel, click Next.
  6. Select the Virtual adapter ( vmk) to migrate and click Destination port group field. For each adapter, select the correct port group from dropdown, Click Next.
  7. Click Next to omit virtual machine networking migration.
  8. Click Finish after reviewing the new vmkernel and Uplink assignment.
  9. The wizard and the job completes moving both the vmk interface and the vmnic to the dVswitch.

Basically add physical NICs to the distributed switch & migrate vmk NICs as part of the process. For good measure I usually migrate only one physical NIC from the standard switch to the distributed switch, and then separately migrate the vmk NICs. 

Here’s what happens when I am doing the above now. (Note: now. I never had an issue with this earlier. Am guessing it must be some bug in a newer 5.5 update, or something’s wrong in the underlying network at my firm. I don’t think it’s the networking coz I got my network admins to take a look, and I tested that all NICs on the host have connectivity to the outside world (did this by making each NIC the active one and disabling the others)). 

First it’s stuck in progress:

And then vCenter cannot see the host any more:

Oddly I can still ping the host on the vmk NIC IP address. However I can’t SSH into it, so the Management bits are what seem to be down. The host has connectivity to the outside world because it passes the Management network tests from DCUI (which I can connect to via iLO). I restarted the Management agents too, but nope – cannot SSH or get vCenter to see the host. Something in the migration step breaks things. Only solution is to reboot and then vCenter can see the host.

Here’s what I did to workaround anyways. 

First I moved one physical NIC to the distributed switch.

Then I created a new management portgroup and VMkernel NIC on that for management traffic. Assigned it a temporary IP.

Next I opened a console to the host. Here’s the current config on the host:

The interface vmk0 (or its IPv4 address rather) is what I wanted to migrate. The interface vmk4 is what I created temporarily. 

I now removed the IPv4 address of the existing vmk NIC and assigned that to the new one. Also, confirmed the changes just to be sure. As soon as I did so vCenter picked up the changes. I then tried to move the remaining physical NIC over to the distributed switch, but that failed. Gave an error that the existing connection was forcibly closed by the host. So I rebooted the host. Post-reboot I found that the host now thought it had no IP, even though it was responding to the old IP via the new vmk. So this approach was a no-go (but still leaving it here as a reminder to myself that this does not work)

I now migrated vmk0 from the standard switch to the distributed switch. As before, this will fail – vCenter will lose connectivity to the ESX host. But that’s why I have a console open. As expected the output of esxcli network ip interface list shows me that vmk0 hasn’t moved to the distributed switch:

So now I go ahead and remove the IPv4 address of vmk0 and assign that to vmk4 (the new one). Also confirmed the changes. 

Next I rebooted (reboot) the host, and via the CLI I removed vmk0 (for some reason the GUI showed both vmk0 and vmk4 with the same IP I assigned above). 

Reboot again!

Post-reboot I can go back to the GUI and move the remaining physical NIC over to the distributed switch. :) Yay!

[Aside] How to quickly get ESXi logs from a web browser (without SSH, vSphere client, etc)

This post made my work easy yesterday – https://www.vladan.fr/check-esxi-logs-from-web-browser/

tl;dr version:  go to https://IP_of_Your_ESXi/host

Notes on MCS disks

Primer 1. Primer 2. MCS Prep overview (good post, I don’t refer to all its points below). 

  • MCS creates a snapshot of the master VM you specify, but if you specify a snapshot it will not create another one. 
  • This snapshot is used to create to create a full clone. A full snapshot, so to say. 
    • This way the image used by the catalog is independent of the master VM. 
    • During the preparation of this full snapshot an “instruction disk” is attached to the VM that is temporarily created using the full snapshot. This disk enables DHCP on all interfaces of the full snapshot; does some KMS related tasks; and runs vDisk inventory collection if required.
  • This full snapshot is stored on each storage repository that is used by Desktop Studio. 
    • This full snapshot is shared by all VMs on that storage repository. 
  • Each storage repository will also have an identity disk (16 MB) per VM.
  • Each storage repository will also have a delta/ difference disk per VM.
    • This is thin provisioned if the storage supports it.
    • Can increase up to the maximum size of the VM.

Remember my previous post on the types:

  • Random.
    • Delta disk is deleted during reboot. 
  • Static + Save changes.
    • Changes are saved to a vDisk. 
    • Delta disk not used?
  • Static + Dedicated VM.
    • Delta disk is not deleted during reboot. 
    • Important to keep in mind: if the master image in the catalog is updated, existing VMs do not automatically start using it upon next reboot. Only newly created dedicated VMs use the new image. 
    • The delta disk is deleted when the master image is updated and existing VMs are made to use the new image (basically, new VMs are created and the delta disk starts from scratch; user customizations are lost). 
    • Better to use desktop management tools (of the OS) to keep dedicated VMs up to date coz of the above issue. 
  • Static + Discard changes.
    • Delta disk is deleted during reboot. 

A post on sealing the vDisk after changes. Didn’t realize there’s so many steps to be done. 

[Aside] Memory Resource Management in ESXi

Came across this PDF from VMware while reading on memory management. It’s dated, but a good read. Below are some notes I took while reading it. Wanted to link to the PDF and also put these somewhere; hence this post.

Some terminology:

  • Host physical memory <–[mapped to]– Guest physical memory (continuous virtual address space presented by Hypervisor to Guest OS) <–[mapped to]– Guest virtual memory (continuous virtual address space presented by Guest OS to its applications).
    • Guest virtual -> Guest physical mapping is in Guest OS page tables
    • Guest physical -> Host physical mapping is in pmap data structure
      • There’s also a shadow page table that the Hypervisor maintains for Guest virtual -> Guest physical
      • A VM does Guest virtual -> Guest physical mapping via hardware Translation Lookup Buffers (TLBs). The hypervisor intercepts calls to these; and uses these to keep its shadow page tables up to date.
  • Guest physical memory -> Guest swap device (disk) == Guest level paging.
  • Guest physical memory -> Host swap device (disk) == Hypervisor swapping.

Some interesting bits on the process:

  • Applications use OS provided interfaces to allocate & de-allocate memory.
  • OSes have different implementations on how memory is classified as free or allocated. For example: two lists.
  • A VM has no pre-allocated physical memory.
  • Hypervisor maintains its own data structures for free and allocated memory for a VM.
  • Allocating memory for a VM is easy. When the VM Guest OS makes a request to a certain location, it will generate a page fault. The hypervisor can capture that and allocate memory.
  • De-allocation is tricky because there’s no way for the hypervisor to know the memory is not in use. These lists are internal to the OS. So there’s no straight-forward way to take back memory from a VM.
  • The host physical memory assigned to a VM doesn’t keep growing indefinitely though as the guest OS will free and allocate within the range assigned to it, so it will stick within what it has. And side by side the hypervisor tries to take back memory anyways.
    • Only when the VM tries to access memory that is not actually mapped to host physical memory does a page fault happen. The hypervisor will intercept that and allocate memory.
  • For de-allocation, the hypervisor adds the VM assigned memory to a free list. Actual data in the physical memory may not be modified. Only when that physical memory is subsequently allocated to some other VM does it get zeroed out.
  • Ballooning is one way of reclaiming memory from the VM. This is a driver loaded in the Guest OS.
    • Hypervisor tells ballooning driver how much memory it needs back.
    • Driver will pin those memory pages using Guest OS APIs (so the Guest OS thinks those pages are in use and should not assign to anyone else).
    • Driver will inform Hypervisor it has done this. And Hypervisor will remove the physical backing of those pages from physical memory and assign it to other VMs.
    • Basically the balloon driver inflates the VM’s memory usage, giving it the impression a lot of memory is in use. Hence the term “balloon”.
  • Another way is Hypervisor swapping. In this the Hypervisor swaps to physical disk some of the physical memory it has assigned to the VM. So what the VM thinks is physical memory is actually on disk. This is basically swapping – just that it’s done by Hypervisor, instead of Guest OS.
    • This is not at all preferred coz it’s obviously going to affect VM performance.
    • Moreover, the Guest OS too could swap the same memory pages to its disk if it is under memory pressure. Hence double paging.
  • Ballooning is slow. Hypervisor swapping is fast. Ballooning is preferred though; Hypervisor swapping is only used when under lots of pressure.
  • Host (Hypervisor) has 4 memory states (view this via esxtop, press m).
    • High == All Good
    • Soft == Start ballooning. (Starts before the soft state is actually reached).
    • Hard == Hypervisor swapping too.
    • Low == Hypervisor swapping + block VMs that use more memory than their target allocations.

 

TIL: vCenter inherited permissions are not cumulative

Say you are part of two groups. Group A has full rights on the vCenter. Group B has limited rights on a cluster.

You would imagine that since you are a member of Group A and that has full rights on vCenter itself, your rights on the cluster in question won’t be limited. But nope, you are wrong. Since you are a member of Group B and that has limited rights on the cluster, your rights too are restricted. Bummer if you are a member of multiple groups and some of these groups have limited rights on child objects! :o)

Workaround is to add yourself or Group A explicitly on that cluster, with full rights. Then the permissions become cumulative.

Notes to self on XenServer storage

Playing with XenServer in my testlab (basically as a VM in VMware Workstation hah!) I ran into trouble while creating a Machine Catalog via Citrix Desktop Studio. I forget the exact message but it was about lack of resources. I could see that in the create catalog process it was creating a snapshot and making a copy VM, powering it on and off successfully, and then it was failing. I kept an eye on my storage during this and saw that indeed it was exceeding the allocated space. I had thought it would do thin provisioning but in retrospect I realize XenServer never asked me about thick or thin when I added my iSCSI storage. Hmm.

Well turns out that for iSCSI XenServer has only thick provisioning. You get thin provisioning only if you are using the ext3 filesystem or NFS. Since iSCSI uses LVM, bummer! 

Here’s a forum post on how to identify if your SR is thick or not. 

Regarding thin provisioning – it is only for locally attached storage (which can use ext3) or NFS. Block attached storage is thick.

Before I realized all this I had spent some Googling on how to create a thin provisioned SR (Storage Repository). I felt that maybe it’s a GUI restriction and I can workaround by using the CLI. Turns out I was wrong. Here’s an article that explains SRs in XenServer anyway. It’s a good read. Here’s an article just on enabling thin provisioning for ext3 SRs via the CLI. 

While on the topic of storage, this is something I wanted to blog about earlier but never got around to. When using SMB/ CIFS shares, XenServer only supports NTLMv1. Here’s instructions on using NTLMv2

Also, smbclient is a good tool to test SMB connects from a XenServer. Example:

That seems to work, but I get a logon failure. This is because I didn’t put the username in quotes. 

That works!

I have no idea what the three commands below except that they are to do with mounting an SMB/ CIFS share on a XenServer permanently. I had noted these commands as part of my would be blog post, but it’s been a while now and I forget. Sometime when I get around to doing SMB3 or NTLMv2 with XenServer again I hope to refer to these again and better explain. I don’t want to spend too much time on XenServers now and get sidetracked …

After issue the above commands I think the shared folder is mounted only on one host in the pool. But right clicking on it and doing a repair will get it mounted on all hosts in the pool.

XenServer 7.0 and above support SMB for VM disk storage too. Prior versions support SMB only for ISO storage. 

NSX Firewall no working on Layer3; OpenBSD VMware Tools; IP Discovery, etc.

I have two security groups. Network 1 VMs (a group that contains my VMs in the 192.168.1.0/24) and Network 2 VMs (similar, for 192.168.2.0/24 network). 

Both are dynamic groups. I select members based on whether the VM name contains -n1 or -n2. (The whole exercise is just for fun/ getting to know this stuff). 

I have two firewall rules making use of these rules. Layer 2 and Layer 3. 

The Layer 2 rule works but the Layer 3 one does not! Weird. 

I decided to troubleshoot this via the command line. Figured it would be a good opportunity.

To troubleshoot I have to check the rules on the hosts (because remember, that’s where the firewall is; it’s a kernel module in each host). For that I need to get the host-id. For which I need to get the cluster-id. Sadly there’s no command to list all hosts (or at least I don’t know of any). 

So now I have my host-ids.

Let’s also take a look the my VMs (thankfully it’s a short list! I wonder how admins do this in real life):

We can see the filters applying to each VM.  To summarize:

And are these filters applying on the hosts themselves?

Hmm, that too looks fine. 

Next I picked up one of the rule sets and explored it further:

The Layer 3 & Layer 2 rules are in separate rule sets. I have marked the ones which I am interested in. One works, the other doesn’t. So I checked the address sets used by both:

Tada! And there we have the problem. The address set for the Layer 3 rule is empty. 

I checked this for the other rules too – same situation. I modified my Layer 3 rule to specifically target the subnets:

And the address set for that rule is not empty:

And because of this the firewall rules do work as expected. Hmm.

I modified this rule to be a group with my OpenBSD VMs from each network explicitly added to it (i.e. not dynamic membership in case that was causing an issue). But nope, same result – empty address set!

But the address set is now empty. :o)

So now I have an idea of the problem. I am not too surprised by this because I vaguely remember reading something about VMware Tools and IP detection inside a VM (i.e. NSX makes use of VMware Tools to know the IP address of a VM) and also because I am aware OpenBSD does not use the official VMware Tools package (it has its own and that only provides a subset of functions).

Googling a bit on this topic I came across the IP address Discovery section in the NSX Admin guide – prior to NSX 6.2 if VMware Tools wasn’t installed (or was stopped) NSX won’t be able to detect the IP address of the VM. Post NSX 6.2 it can do DHCP & ARP snooping to work around a missing/ stopped VMware Tools. We configure the latter in the host installation page:

I am going to go ahead and enable both on all my clusters. 

That helped. But it needs time. Initially the address set was empty. I started pings from one VM to another and the source VM IP was discovered and put in the address set; but since the destination VM wasn’t in the list traffic was still being allowed. I stopped pings, started pings, waited a while … tried again … and by then the second VM IP to was discovered and put in the address set – effectively blocking communication between them. 

Side by side I installed a Windows 8.1 VM with VMware Tools etc and tested to see if it was being automatically picked up (I did this before enabling the snooping above). It was. In fact its IPv6 address too was discovered via VMware Tools and added to the list:

Nice! Picked up something interesting today. 

Nested XenServer crashes when scrubbing memory

In case anyone else runs into this. I noticed that both XenServer 6.5 and 7.0 crash at the memory scrubbing stage during boot up when run as a VM within VMware Workstation (and possibly other virtualization products too – I didn’t try it with anything else). 

Am guessing the crash happens because the memory is not really available (this being a nested VM) and so the process crashes. Anyhoo, the workaround is to disable memory scrubbing. Check this blog post for instructions. 

In brief, the instructions are to add the option bootscrub=false to the boot options. This is via the file /boot/extlinux.conf in XenServer 6.5; or via /boot/grub/grub.cfg in XenServer 7.0.

Notes to self while installing NSX 6.3 (part 4)

Reading through the VMware NSX 6.3 Install Guide after having installed the DLR and ESG in my home lab. Continuing from the DLR section.

As I had mentioned earlier NSX provides routing via DLR or ESG.  

  • DLR == Distributed Logical Router.
  • ESG == Edge Services Gateway

DLR consists of an appliance that provides the control plane functionality. This appliance does not do any routing itself. The actual routing is done by the VIBs on the ESXi hosts. The appliance uses the NSX Controller to push out updates to the ESXi host. (Note: Only DLR. ESG does not depend on the Controller to push out route). Couple of points to keep in mind:

  • A DLR instance cannot connect to logical switches in different transport zones. 
  • A DLR cannot connect to a dvPortgroup with VLAN ID 0.
  • A DLR cannot connect to a dvPortgroup with VLAN ID if that DLR also connects to logical switches spanning more than one VDS. 
    • This confused me. Why would a logical switch span more than one VDS? I dunno. There are reasons probably, same way you could have multiple clusters in same data center having different VDSes instead of using the same one. 
  • If you have portgroups on different VDSes with the same VLAN ID, and these VDSes share some hosts, then DLR cannot connect these. 

I am not entirely clear with the above points. It’s more to enforce the transport zones and logical switches align correctly, but I haven’t entirely understood it so I am simply going to make note as above and move on …

In a DLR the firewall rules only apply to the uplink interface and are limited to traffic destined for the edge virtual appliance. In other words they don’t apply to traffic between the logical switches a DLR instance connects. (Note that this refers to the firwall settings found under the DLR section, not in the Firewall section of NSX). 

A DLR has many interfaces. The one exposed to VMs for routing is the Logical InterFace (LIF). Here’s a screenshot from the interfaces on my DLR. 

The ones of type ‘Internal’ are the LIFs. These are the interfaces that the DLR will route between. Each LIF connects to a separate network – in my case a logical switch each. The IP address assigned to this LIF will be the address you set as gateway for the devices in that network. So for example: one of the LIFs has an IP address 192.168.1.253 and connects to my 192.168.1.0/24 segment. All the VMs there will have 192.168.1.253 as their default gateway. Suppose we ignore the ‘Uplink’ interface for now (it’s optional, I created it for the external routing to work), and all our DLR had were the two ‘Internal’ LIFs, and VMs on each side had the respective IP address set as their default gateway, then our DLR will enable routing between these two networks. 

Unlike a physical router though, which exists outside the virtual network and which you can point to as “here’s my router”, there’s no such concept with DLRs. The DLR isn’t a VM which you can point to as your router. Nor is it a VM to which packets between these networks (logical switches) are sent to for routing. The DLR, as mentioned above, is simply your ESXi hosts. Each ESXi host that has logical switches which a DLR connects into has this LIF created in them with that LIF IP address assigned to it and a virtual MAC so VMs can send packets to it. The DLR is your ESXi host. (That is pretty cool, isn’t it! I shouldn’t be amazed because I had mentioned it earlier when reading about all this, but it is still cool to actually “see” it once I have implemented).

Above screenshot is from my two VMs on the same VXLAN but on different hosts. Note that the default gateway (192.168.1.253) MAC is the same for both. Each of their hosts will respond to this MAC entry. 

(Note to self: Need to explore the net-vdr command sometime. Came across it as I was Googling on how to find the MAC address table seen by the LIF on a host. Didn’t want to get side-tracked so didn’t explore too much. There’s something called a VDR (not encountered it yet in my readings).

  • net-vdr -I -l will list all the VDRs on a host.
  • net-vdr -L -l <vdrname> will list the LIFs.
  • net-vdr -N -l <vdrname> will list the MAC addresses (ARP info)

)

When creating a DLR it is possible to create it with or without the appliance. Remember that the appliance provides the control plane functionality. It is the appliance that learns of new routes etc and pushes to the DLR modules in the ESXi hosts. Without an appliance the DLR modules will do static routing (which might be more than enough, especially in a test environment like my nested lab for instance) so it is ok to skip it if your requirements are such. Adding an appliance means you get to (a) select if it is deployed in HA config (i.e. two appliance), (b) their locations etc, (c) IP address and such for the appliance, as well as enabling SSH. The appliance is connected to a different interface for HA and SSH – this is independent of the LIFs or Uplink interfaces. That interface isn’t used for any routing. 

Apart from the control plane, the appliance also controls the firewall on the DLR. If there’s no appliance you can’t make any firewall changes to the DLR – makes sense coz there’s nothing to change. You won’t be connecting to the DLR for SSH or anything coz you do that to the appliance on the HA interface. 

According to the docs you can’t add an appliance once a DLR instance is deployed. Not sure about that as I do see an option to deploy an appliance on my non-appliance DLR instance. Maybe it will fail when I actually try and create the appliance – I didn’t bother trying. 

Discovered this blog post while Googling for something. I’ve encountered & linked to his posts previously too. He has a lot of screenshots and step by step instructions. So worth a check out if you want to see some screenshots and much better explanation than me. :) Came across some commands from his blog which can be run on the NSX Controller to see the DLRs it is aware of and their interfaces. Pasting the output from my lab here for now, I will have to explore this later …

I have two DLRs. One has an appliance, other doesn’t. I made these two, and a bunch of logical switches to hook these to, to see if there’s any difference in functionality or options.

One thing I realized as part of this exercise is that a particular logical switch can only connect to one DLR. Initially I had one DLR which connected to 192.168.1.0/24 and 192.168.2.0/24. Its uplink was on logical switch 192.168.0.0/24 which is where the ESG too hooked into. Later when I made one more DLR with its own internal links and tried to connect its uplink to the 192.168.0.0/24 network used by the previous DLR, I saw that it didn’t even appear in the list of options. That’s when I realized its better to use a smaller range logical switch for the uplinks – like say a /30 network. This way each DLR instance connects to an ESG on its own /30 network logical switch (as in the output above). 

A DLR can have up to 8 uplink interfaces and 1000 internal interfaces.


Moving on to ESG. This is a virtual appliance. While a DLR provides East-West routing (i.e. within the virtual environment), an ESG provides North-South routing (i.e. out of the virtual environment). The ESG also provides services such as DHCP, NAT, VPN, and Load Balancing. (Note to self: DLR does not provide DHCP or Load Balancing as one might expect (at least I did! :p). DLR provides DHCP Relay though). 

The uplink of an ESG will be a VDS (Distributed Switch) as that’s what eventually connects an ESXi environment to the physical network. 

An ESG needs an appliance to be deployed. You can enable/ disable SSH into this appliance. If enabled you can SSH into the ESG appliance from the uplink address or from any of the internal link IP addresses. In contrast, you can only SSH into a DLR instance if it has an associated appliance. Even then, you cannot SSH into the appliance from the internal LIFs (coz these don’t really exist, remember … they are on each ESXi host). With a DLR we have to SSH into the interface used for HA (this can be used even if there’s only one appliance and hence no HA). 

When deploying an ESG appliance HA can be enabled. This deploys two appliances in an active/passive mode (and the two appliances will be on separate hosts). These two appliances will talk to each other to keep in sync via one of the internal interfaces (we can specify one, or NSX will just choose any). On this internal interface the appliances will have a link local IP address (a /30 subnet from 169.254.0.0/16) and communicate over that (doesn’t matter that there’s some other IP range actually used in that segment, as these are link local addresses and unlikely anyone’s going to actually use them). In contrast, if a DLR appliance is deployed with HA we need to specify a separate network from the networks that it be routing between. This can be a logical switch or a DVS, and as with ESG the two appliances will have link local IP addresses (a /30 subnet from 169.254.0.0/16) for communication. Optionally, we can specify an IP address in this network via which we can SSH into the DLR appliance (this IP address will not be used for HA, however).

After setting up all this, I also created two NAT rules just for kicks. 

And with that my basic setup of NSX is complete! (I skipped OSPF as I don’t think I will be using it any time soon in my immediate line of work; and if I ever need to I can come back to it later). Next I need to explore firewalls (micro-segmentation) and possibly load balancing etc … and generally fiddle around with this stuff. I’ve also got to start figuring out the troubleshooting and command-line stuff. But the base is done – I hope!

Yay! (VXLAN) contd. + Notes to self while installing NSX 6.3 (part 3)

Finally continuing with my NSX adventures … some two weeks have past since my last post. During this time I moved everything from VMware Workstation to ESXi. 

Initially I tried doing a lift and shift from Workstation to ESXi. Actually, initially I went with ESXi 6.5 and that kept crashing. Then I learnt it’s because I was using the HPE customized version of ESXi 6.5 and since the server model I was using isn’t supported by ESXi 6.5 it has a tendency to PSOD. But strangely the non-HPE customized version has no issues. But after trying the HPE version and failing a couple of times, I gave up and went to ESXi 5.5. Set it up, tried exporting from VMware Workstation to ESXi 5.5, and that failed as the VM hardware level on Workstation was newer than ESXi. 

Not an issue – I fired up VMware Converter and converted each VM from Workstation to ESXi. 

Then I thought hmm, maybe the MAC addresses will change and that will cause an issue, so I SSH’ed into the ESXi host and manually changed the MAC addresses of all my VMs to whatever it was in Workstation. Also changed the adapters to VMXNet3 wherever it wasn’t. Reloaded the VMs in ESXi, created all the networks (portgroups) etc, hooked up the VMs to these, and fired them up. That failed coz the MAC address ranges were of VMware Workstation and ESXi refuses to work with those! *grr* Not a problem – change the config files again to add a parameter asking ESXi to ignore this MAC address problem – and finally it all loaded. 

But all my Windows VMs had their adapters reset to a default state. Not sure why – maybe the drivers are different? I don’t know. I had to reconfigure all of them again. Then I turned to OpnSense – that too had reset all its network settings, so I had to configure those too – and finally to nested ESXi hosts. For whatever reason none of them were reachable; and worse, my vCenter VM was just a pain in the a$$. The web client kept throwing some errors and simply refused to open. 

That was the final straw. So in frustration I deleted it all and decided to give up.

But then …

I decided to start afresh. 

Installed ESXi 6.5 (the VMware version, non-HPE) on the host. Created a bunch of nested ESXi VMs in that from scratch. Added a Windows Server 2012R2 as the shared iSCSI storage and router. Created all the switches and port groups etc, hooked them up. Ran into some funny business with the Windows Firewall (I wanted to assign some interface as Private, others as Public, and enable firewall only only the Public ones – but after each reboot Windows kept resetting this). So I added OpnSense into the mix as my DMZ firewall.

So essentially you have my ESXi host -> which hooks into an internal vSwitch portgroup that has the OpnSense VM -> which hooks into another vSwitch portgroup where my Server 2012R2 is connected to, and that in turn connects to another vSwitch portgroup (a couple of them actually) where my ESXi hosts are connected to (need a couple of portgroup as my ESXi hosts have to be in separate L3 networks so I can actually see a benefit of VXLANs). OpnSense provides NAT and firewalling so none of my VMs are exposed from the outside network, yet they can connect to the outside network if needed. (I really love OpnSense by the way! An amazing product). 

Then I got to the task of setting these all up. Create the clusters, shared storage, DVS networks, install my OpenBSD VMs inside these nested EXSi hosts. Then install NSX Manager, deploy controllers, configure the ESXi hosts for NSX, setup VXLANs, segment IDs, transport zones, and finally create the Logical Switches! :) I was pissed off initially at having to do all this again, but on the whole it was good as I am now comfortable setting these up. Practice makes perfect, and doing this all again was like revision. Ran into problems at each step – small niggles, but it was frustrating. Along the way I found that my (virtual) network still does not seem to support large MTU sizes – but then I realized it’s coz my Server 2012R2 VM (which is the router) wasn’t setup with the large MTU size. Changed that, and that took care of the MTU issue. Now both Web UI and CLI tests for VXLAN succeed. Finally!

Third time lucky hopefully. Above are my two OpenBSD VMs on the same VXLAN, able to ping each other. They are actually on separate L3 ESXi hosts so without NSX they won’t be able to see each other. 

Not sure why there are duplicate packets being received. 

Next I went ahead and set up a DLR so there’s communicate between VXLANs. 

Yeah baby! :o)

Finally I spent some time setting up an ESG and got these OpenBSD VMs talking to my external network (and vice versa). 

The two command prompt windows are my Server 2012R2 on the LAN. It is able to ping the OpenBSD VMs and vice versa. This took a bit more time – not on the NSX side – as I forgot to add the routing info on the ESG for my two internal networks (192.168.1.0/24 and 192.168.2.0/24) as well on the Server 2012R2 (192.168.0.0/16). Once I did that routing worked as above. 

I am aware this is more of a screenshots plus talking post rather than any techie details, but I wanted to post this here as a record for myself. I finally got this working! Yay! Now to read the docs and see what I missed out and what I can customize. Time to break some stuff finally (intentionally). 

:o)