Notes to self while installing NSX 6.3 (part 1)

(No sense or order here. These are just notes I took when installing NSX 6.3 in my home lab, while reading this excellent NSX for Newbies series and the NSX 6.3 install guide from VMware (which I find to be quite informative). Splitting these into parts as I have been typing this for a few days).

You can install NSX Manager in VMware Workstation (rather than in the nested ESXi installation if you are doing it in a home lab). You won’t get a chance to configure the IP address, but you can figure it from your DHCP server. Browse to that IP in a browser and login as username “admin” password “default” (no double quotes). 

If you want to add a certificate from your AD CA to NSX Manager create the certificate as usual in Certificate Manager. Then export the generated certificate and your root CA and any intermediate CA certificates as a “Base-64 encoded X.509 (.CER)” file. Then concatenate all these certificates into a single file (basically, open up Notepad and make a new file that has all these certificates in it). Then you can import it into NSX Manager. (More details here).

During the Host Preparation step on an ESXi 5.5 host it failed with the following error: 

“Could not install image profile: ([], “Error in running [‘/etc/init.d/vShield-Stateful-Firewall’, ‘start’, ‘install’]:\nReturn code: 1\nOutput: vShield-Stateful-Firewall is not running\nwatchdog-dfwpktlogs: PID file /var/run/vmware/watchdog-dfwpktlogs.PID does not exist\nwatchdog-dfwpktlogs: Unable to terminate watchdog: No running watchdog process for dfwpktlogs\nFailed to release memory reservation for vsfwd\nResource pool ‘host/vim/vmvisor/vsfwd’ release failed. retrying..\nResource pool ‘host/vim/vmvisor/vsfwd’ release failed. retrying..\nResource pool ‘host/vim/vmvisor/vsfwd’ release failed. retrying..\nResource pool ‘host/vim/vmvisor/vsfwd’ release failed. retrying..\nResource pool ‘host/vim/vmvisor/vsfwd’ release failed. retrying..\nSet memory minlimit for vsfwd to 256MB\nFailed to set memory reservation for vsfwd to 256MB, trying for 256MB\nFailed to set memory reservation for vsfwd to failsafe value of 256MB\nMemory reservation released for vsfwd\nResource pool ‘host/vim/vmvisor/vsfwd’ released.\nResource pool creation failed. Not starting vShield-Stateful-Firewall\n\nIt is not safe to continue. Please reboot the host immediately to discard the unfinished update.”)” Error 3/16/2017 5:17:49 AM esx55-01.fqdn

Initially I thought maybe NSX 6.3 wasn’t compatible with ESXi 5.5 or that I was on an older version of ESXi 5.5 – so I Googled around on pre-requisites (ESXi 5.5 seems to be fine) and also updated ESXi 5.5 to the latest version. Then I took a closer look at the error message above and saw the bit about the 256MB memory reservation. My ESXi 5.5 host only had 3GB RAM (I had installed with 4GB and reduced it to 3GB) so I bumped it up to 4GB RAM and tried again. And voila! the install worked. So NSX 6.3 requires an ESXi 5.5 host with minimum 4GB RAM (well maybe 3.5 GB RAM works too – I was too lazy to try!) :o)

If you want, you can browse to “https://<NSX_MANAGER_IP>/bin/vdn/nwfabric.properties” to manually download the VIBs that get installed as part of the Host Preparation. This is in case you want to do a manual install (thought had crossed my mind as part of troubleshooting above).

NSX Manager is your management layer. You install it first and it communicates with vCenter server. A single NSX Manager install is sufficient. There’s one NSX Manager per vCenter. 

The next step after installing NSX Manager is to install NSX Controllers. These are installed in odd numbers to maintain quorum. This is your control plane. Note: No data traffic flows through the controllers. The NSX Controllers perform many roles and each role has a master controller node (if this node fails another one takes its place via election). 

Remember that in NSX the VXLAN is your data plane. NSX supports three control plane modes: multicast, unicast, and hybrid when it comes to BUM (Broadcast, unknown Unicast, and Multicast) traffic. BUM traffic is basically traffic that doesn’t have a specific Layer 3 destination. (More info: [1], [2], [3] … and many on the Internet but these three are what I came across initially via Google searches).

  • In unicast mode a host replicates all BUM traffic to all other hosts on the same VXLAN and also picks a host in every other VXLAN to do the same for hosts in their VXLANs. Thus there’s no dependence on the underlying hardware. There could, however, be increased traffic as the number of VXLANs increase. Note that in the case of unknown unicast the host first checks with the NSX Controller for more info. (That’s the impression I get at least from the [2] post above – I am not entirely clear). 
  • In multicast mode a host depends on the underlying networking hardware to replicate BUM traffic via multicast. All hosts on all VXLAN segments join multicast groups so any BUM traffic can be replicated by the network hardware to this multicast group. Obviously this mode requires hardware support. Note that multicast is used for both Layer 2 and Layer 3 here. 
  • In hybrid mode some of the BUM traffic replication is handed over to the first hop physical switch (so rather than a host sending unicast traffic to all other hosts connected to the same physical switch it relies on the switch to do this) while the rest of the replication is done by the host to hosts in other VXLANs. Note that multicast is used only for Layer 2 here. Also note that as in the unicast mode, in the case of unknown unicast traffic the Controller is consulted first. 

NSX Edge provides the routing. This is either via the Distributed Logical Router (DLR), which is installed on the hypervisor + a DLR virtual appliance; or via the Edge Services Gateway (ESG), which is a virtual appliance. 

  • A DLR can have up to 8 uplink interfaces and 1000 internal interfaces.
    • A DLR uplink typically connects to an ESG via a Layer 2 logical switch. 
    • DLR virtual appliance can be set up in HA mode – in an active/ standby configuration.
      • Created from NSX Manager?
    • The DLR virtual appliance is the control plane – it supports dynamic routing protocols and exchanges routing updates with Layer 3 devices (usually ESG).
      • Even if this virtual appliance is down the routing isn’t affected. New routes won’t be learnt that’s all.
    • The ESXi hypervisors have DLR VIBs which contain the routing information etc. got from the controllers (note: not from the DLR virtual appliance). This the data layer. Performs ARP lookup, route lookup etc. 
      • The VIBs also add a Logical InterFace (LIF) to the hypervisor. There’s one for each Logical Switch (VXLAN) the host connects to. Each LIF, of each host, is set to the default gateway IP of that Layer 2 segment. 
  • An ESG can have up to 10 uplink and internal interfaces. (With a trunk an ESG can have up to 200 sub-interfaces). 
    • There can be multiple ESG appliances in a datacenter. 
    • Here’s how new routes are learnt: NSX Edge Gateway (EGW) learns a new route -> This is picked up by the DLR virtual appliance as they are connected -> DLR virtual appliance passes this info to the NSX Controllers -> NSX Controllers pass this to the ESXi hosts.
    • The ESG is what connects to the uplink. The DLR connects to ESG via a Logical Switch. 

Logical Switch – this is the switch for a VXLAN. 

NSX Edge provides Logical VPNs, Logical Firewall, and Logical Load Balancer.