Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

Yay! (VXLAN) contd. + Notes to self while installing NSX 6.3 (part 3)

Finally continuing with my NSX adventures … some two weeks have past since my last post. During this time I moved everything from VMware Workstation to ESXi. 

Initially I tried doing a lift and shift from Workstation to ESXi. Actually, initially I went with ESXi 6.5 and that kept crashing. Then I learnt it’s because I was using the HPE customized version of ESXi 6.5 and since the server model I was using isn’t supported by ESXi 6.5 it has a tendency to PSOD. But strangely the non-HPE customized version has no issues. But after trying the HPE version and failing a couple of times, I gave up and went to ESXi 5.5. Set it up, tried exporting from VMware Workstation to ESXi 5.5, and that failed as the VM hardware level on Workstation was newer than ESXi. 

Not an issue – I fired up VMware Converter and converted each VM from Workstation to ESXi. 

Then I thought hmm, maybe the MAC addresses will change and that will cause an issue, so I SSH’ed into the ESXi host and manually changed the MAC addresses of all my VMs to whatever it was in Workstation. Also changed the adapters to VMXNet3 wherever it wasn’t. Reloaded the VMs in ESXi, created all the networks (portgroups) etc, hooked up the VMs to these, and fired them up. That failed coz the MAC address ranges were of VMware Workstation and ESXi refuses to work with those! *grr* Not a problem – change the config files again to add a parameter asking ESXi to ignore this MAC address problem – and finally it all loaded. 

But all my Windows VMs had their adapters reset to a default state. Not sure why – maybe the drivers are different? I don’t know. I had to reconfigure all of them again. Then I turned to OpnSense – that too had reset all its network settings, so I had to configure those too – and finally to nested ESXi hosts. For whatever reason none of them were reachable; and worse, my vCenter VM was just a pain in the a$$. The web client kept throwing some errors and simply refused to open. 

That was the final straw. So in frustration I deleted it all and decided to give up.

But then …

I decided to start afresh. 

Installed ESXi 6.5 (the VMware version, non-HPE) on the host. Created a bunch of nested ESXi VMs in that from scratch. Added a Windows Server 2012R2 as the shared iSCSI storage and router. Created all the switches and port groups etc, hooked them up. Ran into some funny business with the Windows Firewall (I wanted to assign some interface as Private, others as Public, and enable firewall only only the Public ones – but after each reboot Windows kept resetting this). So I added OpnSense into the mix as my DMZ firewall.

So essentially you have my ESXi host -> which hooks into an internal vSwitch portgroup that has the OpnSense VM -> which hooks into another vSwitch portgroup where my Server 2012R2 is connected to, and that in turn connects to another vSwitch portgroup (a couple of them actually) where my ESXi hosts are connected to (need a couple of portgroup as my ESXi hosts have to be in separate L3 networks so I can actually see a benefit of VXLANs). OpnSense provides NAT and firewalling so none of my VMs are exposed from the outside network, yet they can connect to the outside network if needed. (I really love OpnSense by the way! An amazing product). 

Then I got to the task of setting these all up. Create the clusters, shared storage, DVS networks, install my OpenBSD VMs inside these nested EXSi hosts. Then install NSX Manager, deploy controllers, configure the ESXi hosts for NSX, setup VXLANs, segment IDs, transport zones, and finally create the Logical Switches! :) I was pissed off initially at having to do all this again, but on the whole it was good as I am now comfortable setting these up. Practice makes perfect, and doing this all again was like revision. Ran into problems at each step – small niggles, but it was frustrating. Along the way I found that my (virtual) network still does not seem to support large MTU sizes – but then I realized it’s coz my Server 2012R2 VM (which is the router) wasn’t setup with the large MTU size. Changed that, and that took care of the MTU issue. Now both Web UI and CLI tests for VXLAN succeed. Finally!

Third time lucky hopefully. Above are my two OpenBSD VMs on the same VXLAN, able to ping each other. They are actually on separate L3 ESXi hosts so without NSX they won’t be able to see each other. 

Not sure why there are duplicate packets being received. 

Next I went ahead and set up a DLR so there’s communicate between VXLANs. 

Yeah baby! :o)

Finally I spent some time setting up an ESG and got these OpenBSD VMs talking to my external network (and vice versa). 

The two command prompt windows are my Server 2012R2 on the LAN. It is able to ping the OpenBSD VMs and vice versa. This took a bit more time – not on the NSX side – as I forgot to add the routing info on the ESG for my two internal networks (192.168.1.0/24 and 192.168.2.0/24) as well on the Server 2012R2 (192.168.0.0/16). Once I did that routing worked as above. 

I am aware this is more of a screenshots plus talking post rather than any techie details, but I wanted to post this here as a record for myself. I finally got this working! Yay! Now to read the docs and see what I missed out and what I can customize. Time to break some stuff finally (intentionally). 

:o)

Automatic Metric and Windows routing

IP routing involves metrics. This is the cost of each route. If there are multiple routes to a destination then the route with lowest metric/ cost is chosen.

In the context of Windows OS there are two metrics that come into play.

automatic metric

One is the metric of the interface/ NIC itself (that’s the “Automatic metric” checkbox above). By default its set to automatic, and this determines the cost of using that interface itself. For example if both your wireless and wired connection can access the Internet, which one should the machine choose? The interface metric is used to make this decision. You can assign a value to this metric if you want to force a decision.

Each interface can have multiple gateways to various networks it knows of. Could be that it has more than one gateway to the same network – say, your wired connection can connect to the Internet from two different routers on your network, which one should it choose? Here’s where the gateway metric comes into play (circled in the screenshot above). By default when you add a gateway its metric is set to automatic, but here too you can assign a value.

gateway metric

So far so good. Now how does all this come into play together?

The first thing to know is that gateway metrics have a value of 256 by default (when set to “Automatic metric”). So if you have more than one gateway to a particular destination, and the metric is set to automatic, then by default both gateways have a metric value of 256 and hence equal preference. Remember that.

The next thing to know is that interface metrics have a value ranging from 5 to 50 (when set to “Automatic metric”) based on the speed of the interface. Lower numbers are better than higher numbers. See this KB article for the numbers, here’s a screenshot from that article.

interface metrics

So if you have two wired connections for instance, one of speed 1 GB and other of speed 10 GB, then the 1 GB interface has a metric of 10 and the 10 GB interface has a metric of 5 – thus making the latter preferred.

To view the interface & gateway metrics assigned to your interfaces use the netsh interface ip show address command: