Notes to self while installing NSX 6.3 (part 4)

Reading through the VMware NSX 6.3 Install Guide after having installed the DLR and ESG in my home lab. Continuing from the DLR section.

As I had mentioned earlier NSX provides routing via DLR or ESG.

DLR == Distributed Logical Router.
ESG == Edge Services Gateway

DLR consists of an appliance that provides the control plane functionality. This appliance does not do any routing itself. The actual routing is done by the VIBs on the ESXi hosts. The appliance uses the NSX Controller to push out updates to the ESXi host. (Note: Only DLR. ESG does not depend on the Controller to push out route). Couple of points to keep in mind:

A DLR instance cannot connect to logical switches in different transport zones.
A DLR cannot connect to a dvPortgroup with VLAN ID 0.
A DLR cannot connect to a dvPortgroup with VLAN ID if that DLR also connects to logical switches spanning more than one VDS.
- This confused me. Why would a logical switch span more than one VDS? I dunno. There are reasons probably, same way you could have multiple clusters in same data center having different VDSes instead of using the same one.
If you have portgroups on different VDSes with the same VLAN ID, and these VDSes share some hosts, then DLR cannot connect these.

I am not entirely clear with the above points. It’s more to enforce the transport zones and logical switches align correctly, but I haven’t entirely understood it so I am simply going to make note as above and move on …

In a DLR the firewall rules only apply to the uplink interface and are limited to traffic destined for the edge virtual appliance. In other words they don’t apply to traffic between the logical switches a DLR instance connects. (Note that this refers to the firwall settings found under the DLR section, not in the Firewall section of NSX).

A DLR has many interfaces. The one exposed to VMs for routing is the Logical InterFace (LIF). Here’s a screenshot from the interfaces on my DLR.

The ones of type ‘Internal’ are the LIFs. These are the interfaces that the DLR will route between. Each LIF connects to a separate network – in my case a logical switch each. The IP address assigned to this LIF will be the address you set as gateway for the devices in that network. So for example: one of the LIFs has an IP address 192.168.1.253 and connects to my 192.168.1.0/24 segment. All the VMs there will have 192.168.1.253 as their default gateway. Suppose we ignore the ‘Uplink’ interface for now (it’s optional, I created it for the external routing to work), and all our DLR had were the two ‘Internal’ LIFs, and VMs on each side had the respective IP address set as their default gateway, then our DLR will enable routing between these two networks.

Unlike a physical router though, which exists outside the virtual network and which you can point to as “here’s my router”, there’s no such concept with DLRs. The DLR isn’t a VM which you can point to as your router. Nor is it a VM to which packets between these networks (logical switches) are sent to for routing. The DLR, as mentioned above, is simply your ESXi hosts. Each ESXi host that has logical switches which a DLR connects into has this LIF created in them with that LIF IP address assigned to it and a virtual MAC so VMs can send packets to it. The DLR is your ESXi host. (That is pretty cool, isn’t it! I shouldn’t be amazed because I had mentioned it earlier when reading about all this, but it is still cool to actually “see” it once I have implemented).

Above screenshot is from my two VMs on the same VXLAN but on different hosts. Note that the default gateway (192.168.1.253) MAC is the same for both. Each of their hosts will respond to this MAC entry.

(Note to self: Need to explore the net-vdr command sometime. Came across it as I was Googling on how to find the MAC address table seen by the LIF on a host. Didn’t want to get side-tracked so didn’t explore too much. There’s something called a VDR (not encountered it yet in my readings).

net-vdr -I -l will list all the VDRs on a host.
net-vdr -L -l <vdrname> will list the LIFs.
net-vdr -N -l <vdrname> will list the MAC addresses (ARP info)

)

When creating a DLR it is possible to create it with or without the appliance. Remember that the appliance provides the control plane functionality. It is the appliance that learns of new routes etc and pushes to the DLR modules in the ESXi hosts. Without an appliance the DLR modules will do static routing (which might be more than enough, especially in a test environment like my nested lab for instance) so it is ok to skip it if your requirements are such. Adding an appliance means you get to (a) select if it is deployed in HA config (i.e. two appliance), (b) their locations etc, (c) IP address and such for the appliance, as well as enabling SSH. The appliance is connected to a different interface for HA and SSH – this is independent of the LIFs or Uplink interfaces. That interface isn’t used for any routing.

Apart from the control plane, the appliance also controls the firewall on the DLR. If there’s no appliance you can’t make any firewall changes to the DLR – makes sense coz there’s nothing to change. You won’t be connecting to the DLR for SSH or anything coz you do that to the appliance on the HA interface.

According to the docs you can’t add an appliance once a DLR instance is deployed. Not sure about that as I do see an option to deploy an appliance on my non-appliance DLR instance. Maybe it will fail when I actually try and create the appliance – I didn’t bother trying.

Discovered this blog post while Googling for something. I’ve encountered & linked to his posts previously too. He has a lot of screenshots and step by step instructions. So worth a check out if you want to see some screenshots and much better explanation than me. :) Came across some commands from his blog which can be run on the NSX Controller to see the DLRs it is aware of and their interfaces. Pasting the output from my lab here for now, I will have to explore this later …

nsx-controller # show control-cluster logical-routers instance all
LR-Id      LR-Name                                            Universal Service-Controller Egress-Locale                        In-Sync    Sync-Category
0x1389     default+edge-3                                     false     10.10.1.51         local                                Yes        NORMAL
0x1388     default+edge-1                                     false     10.10.1.51         local                                Yes        NORMAL
nsx-controller # show control-cluster logical-routers interface-summary 0x1389
Interface                        Type   Id                       IP[]
13890000000a                     vxlan  5003(0x138b)             192.168.3.253/24
13890000000b                     vxlan  5004(0x138c)             192.168.4.253/24
138900000002                     vxlan  5005(0x138d)             192.168.0.249/30
nsx-controller # show control-cluster logical-routers interface-summary 0x1388
Interface                        Type   Id                       IP[]
13880000000b                     vxlan  5002(0x138a)             192.168.2.253/24
13880000000a                     vxlan  5001(0x1389)             192.168.1.253/24
138800000002                     vxlan  5000(0x1388)             192.168.0.253/30

nsx-controller # show control-cluster logical-routers instance all

LR-Id LR-Name Universal Service-Controller Egress-Locale In-Sync Sync-Category

0x1389 default+edge-3 false 10.10.1.51 local Yes NORMAL

0x1388 default+edge-1 false 10.10.1.51 local Yes NORMAL

nsx-controller # show control-cluster logical-routers interface-summary 0x1389

Interface Type Id IP[]

13890000000a vxlan 5003(0x138b) 192.168.3.253/24

13890000000b vxlan 5004(0x138c) 192.168.4.253/24

138900000002 vxlan 5005(0x138d) 192.168.0.249/30

nsx-controller # show control-cluster logical-routers interface-summary 0x1388

Interface Type Id IP[]

13880000000b vxlan 5002(0x138a) 192.168.2.253/24

13880000000a vxlan 5001(0x1389) 192.168.1.253/24

138800000002 vxlan 5000(0x1388) 192.168.0.253/30

I have two DLRs. One has an appliance, other doesn’t. I made these two, and a bunch of logical switches to hook these to, to see if there’s any difference in functionality or options.

One thing I realized as part of this exercise is that a particular logical switch can only connect to one DLR. Initially I had one DLR which connected to 192.168.1.0/24 and 192.168.2.0/24. Its uplink was on logical switch 192.168.0.0/24 which is where the ESG too hooked into. Later when I made one more DLR with its own internal links and tried to connect its uplink to the 192.168.0.0/24 network used by the previous DLR, I saw that it didn’t even appear in the list of options. That’s when I realized its better to use a smaller range logical switch for the uplinks – like say a /30 network. This way each DLR instance connects to an ESG on its own /30 network logical switch (as in the output above).

A DLR can have up to 8 uplink interfaces and 1000 internal interfaces.

Moving on to ESG. This is a virtual appliance. While a DLR provides East-West routing (i.e. within the virtual environment), an ESG provides North-South routing (i.e. out of the virtual environment). The ESG also provides services such as DHCP, NAT, VPN, and Load Balancing. (Note to self: DLR does not provide DHCP or Load Balancing as one might expect (at least I did! :p). DLR provides DHCP Relay though).

The uplink of an ESG will be a VDS (Distributed Switch) as that’s what eventually connects an ESXi environment to the physical network.

An ESG needs an appliance to be deployed. You can enable/ disable SSH into this appliance. If enabled you can SSH into the ESG appliance from the uplink address or from any of the internal link IP addresses. In contrast, you can only SSH into a DLR instance if it has an associated appliance. Even then, you cannot SSH into the appliance from the internal LIFs (coz these don’t really exist, remember … they are on each ESXi host). With a DLR we have to SSH into the interface used for HA (this can be used even if there’s only one appliance and hence no HA).

When deploying an ESG appliance HA can be enabled. This deploys two appliances in an active/passive mode (and the two appliances will be on separate hosts). These two appliances will talk to each other to keep in sync via one of the internal interfaces (we can specify one, or NSX will just choose any). On this internal interface the appliances will have a link local IP address (a /30 subnet from 169.254.0.0/16) and communicate over that (doesn’t matter that there’s some other IP range actually used in that segment, as these are link local addresses and unlikely anyone’s going to actually use them). In contrast, if a DLR appliance is deployed with HA we need to specify a separate network from the networks that it be routing between. This can be a logical switch or a DVS, and as with ESG the two appliances will have link local IP addresses (a /30 subnet from 169.254.0.0/16) for communication. Optionally, we can specify an IP address in this network via which we can SSH into the DLR appliance (this IP address will not be used for HA, however).

After setting up all this, I also created two NAT rules just for kicks.

And with that my basic setup of NSX is complete! (I skipped OSPF as I don’t think I will be using it any time soon in my immediate line of work; and if I ever need to I can come back to it later). Next I need to explore firewalls (micro-segmentation) and possibly load balancing etc … and generally fiddle around with this stuff. I’ve also got to start figuring out the troubleshooting and command-line stuff. But the base is done – I hope!