Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

[Aside] NSX Security tags don’t work cross-VC

Reminder to myself. 

As mentioned prior, it’s important to note enhancements listed here are applicable primarily for Active/Standby use cases such as DR. The reason for this is the local NSX Manager does not have visibility into the inventory of the other NSX Managers’ vCenters. Thus, when a security rule is utilized with the Universal Security Groups leveraging the new supported matching criteria of VM Name or Universal Security Tag in the source/destination fields, since the translation of the security group happens locally, only the VMs/workloads in the local vCenter will be found as members of the security group.

Thus, when leveraging Universal Security Groups with the new supported matching criteria, the entire application must be at the same site as shown below in Figure 11. For example, if the application is spanning across sites and there is Cross-VC traffic flows, the security policy for the application will not provide the desired results.

Setting up IPsec tunnel from OPNsense at home to Azure

This is mainly based on this and this blog posts with additional inputs from my router FAQ for my router specific stuff. 

I have a virtual network in Azure with a virtual network gateway. I want a Site to Site VPN from my home to Azure so that my home VMs can talk to my Azure VMs. I don’t want to do Point to Site as I have previously done as I want all my VMs to be able to talk to Azure instead of setting up a P2S from each of them. 

My home setup consists of VMware Fusion running a bunch of VMs, one of which is OPNSense. This is my home gateway – it provides routing between my various VM subnets (I have a few) and acts as DNS resolver for the internal VMs etc. OPNSense has one interface that is bridged to my MacBook so it is not NAT’d behind the MacBook, it has an IP on the same network as the MacBook. I decided to do this as to is easier to port forward from my router to OPNSense. 

OPNSense has an internal address of 192.168.1.23. On my router I port forward UDP ports 500 & 4500 to this. I also have IPSec Passthrough enabled on the router (that’s not mentioned in the previous link but I came across it elsewhere). 

My home VMs are in the 10.0.0.0/8 address space (in which there are various subnets that OPNSense is aware of). My Azure address space is 172.16.0.0/16.

First I created a virtual network gateway in Azure. It’s of the “VpnGw1” SKU. I enabled BGP and set the ASN to 65515 (which is the default). This gateway is in the gateway subnet and has an IP of 172.16.254.254. (I am not using BGP actually, but I set this so I can start using it in future. One of the articles I link to has more instructions). 

Next I created a local network gateway with the public IP of my home router and an address space of 10.0.0.0/8 (that of my VMs). Here too I enabled BGP settings and assigned an ASN of 65501 and set the peer address to be the internal address of my OPNSense router – 192.168.1.23. 

Next I went to the virtual network gateway section and in the connections section I created a new site to site (IPsec) connection. Here I have to select the local network gateway I created above, and also create a pre shared key (make up a random passphrase – you need this later). 

That’s all on the Azure end. Then I went to OPNSense and under VPN > IPsec > Tunnel Settings I created a new phase 1 entry. 

NewImage

I think most of it is default. I changed the Key Exchange to “auto” from v2. For “Remote gateway” I filled in my Azure virtual network gateway public IP. Not shown in this screenshot is the pre shared key that I put in Azure earlier. I filled the rest of it thus – 

NewImage

Of particular note is the algorithms. From the OPNSense logs I noticed that these are the combinations Azure supports – 

IKE:AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024, IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024, IKE:AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024, IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024, IKE:3DES_CBC/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024, IKE:3DES_CBC/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024

I didn’t know this and so initially I had selected a DH key group size of 2048 and my connections were failing. From the logs I came across the above and changed it to 1024 (the 2048 is still present but it won’t get used as 1024 will be negotiated; I should remove 2048 really, forgot to do it before taking this screenshot and it doesn’t matter anyways). Then highlighted entry is the combination I chose to go with. 

After this I created a phase 2 entry. This is where I define my local and remote subnets as seen below:

NewImage

I left everything else at their defaults. 

NewImage

And that’s it. After that things connected and I could see a status of connected on the Azure side as well as on my OPNSense side under VPN > IPsec > Status Overview (expand the “I” in the Status column for more info). Logs can be seen under VPN > IPsec > Log File in case things don’t work out as expected. 

I don’t have any VMs in Azure but as a quick test I was able to ping my Azure gateway on its internal IP address (172.16.254.254) from my local VMs. 

Of course a gotcha with this configuration is that when my home public IP changes (as it is a dynamic public IP) this will break. It’s not a big deal for me as I can login to Azure and enter the new public IP in the local network gateway, but I did find this blog post giving a way of automating this

Internet not working in Chrome but works fine in IE

Today, Internet browsing via Chrome stopped working at my office. IE was not affected, only Chrome. The error was just that the site couldn’t be reached.

I fired up Chrome and went to “chrome://net-internals/“. In the page that opened I went to the “Proxy” in the left sidepane and saw that although the original proxy settings were “auto detect” the effective proxy settings were “direct”. That didn’t make sense – Chrome was set the proxy settings of IE, but IE was working fine and detecting a proxy but Chrome wasn’t. A quick Google search showed me that if Chrome is having trouble finding a proxy, it resorts to a direct Internet connection. Seems to be by design. So why was Chrome having trouble finding a proxy? IE was set with a WPAD file location so I went to the “Events” in the side pane of “chrome://net-internals/” to see if it was having trouble finding the WPAD file. It didn’t, but there were errors like these:

The line referred to was the last line of the WPAD file so clearly it was reading it and there was something wrong with the syntax of the file. I opened up the file in Notepad++, set the language to JavaScript (so I get syntax highlighting and braces matching etc), and went through the various script blocks in the file. Sure enough one section had a missing ending brace “}” and that was tripping up Chrome. Not sure why IE was able to move past this error, but there you go. I added the missing brace and Chrome began working. :)

[Aside] Various DNS stuff

No point to this post except as a reference for my future self. I wanted to mention some of the links here to a colleague of mine today but couldn’t remember them. Finally had to search through my browser history. Easier to just put them here for later reference. :)

Via this Pi-Hole page – OpenNIC and DNS.Watch. Both are for uncensored results etc., with the former having additional TLDs too. Sadly neither supports edns-client-subnet so I can’t really use it. :( If I query www.google.com via one of these I get results that are 150-220ms away. Same query via Google DNS or OpenDNS gives me results that are 8ms away!

I hope to implement DNSCrypt-proxy on my Asus router this weekend (time permitting). Seems to be straight-forward to setup on Asus Merlin as there’s an installer and also available via AMTM. My colleague is currently using DNSCrypt.nl as the upstream resolver, but he also mentioned an alternative he hopes to try.

It’s funny there’s a lot more talk about DNS encryption these days. I happened to get on it coz I got the Asus Merlin running at home again recently and also coz of the CloudFlare DNS announcement. I’ve generally been in a geeky mode since then and checking out things like Pi-Hole etc. And just the other day I read an Ars Technica article about DNS encryption and today it turns out my colleague implemented DNSCrypt at his home just today morning.

Something else I hope to try – dunno where though – is the Knot DNS Resolver.

Lastly, totally unrelated but as a reference to myself – I didn’t know there was an open source version of the Synology OS called XPEnology, and I didn’t know of these picoPSU power supplies. So cool! Also, Netgear R7800 seems to be a good router to keep in mind for the future.

Why multiple temporary IPv6 addresses when using SLAAC

Since enabling SLAAC as per my previous post I noticed that Android now has two IPv6 addresses (in addition to the link local one it already had) and Windows has the link-local one, a DHCPv6 one (marked as preferred), and two SLAAC IPv6 addresses (marked as “Temporary IPv6 Address”). Trying to find out why brought me to this superuser page that answered my question.

The long and short of it is that since SLAAC IPv6 addresses are not “centralized” (i.e. not from a DHCPv6 server), the client is at liberty to create multiple IPv6 addresses for privacy purposes. This is mainly to protect your privacy, so servers on the Internet are not able to track you consistently (nor try and collect your IPv6 address and try to make contact with your client I guess). Via the netsh interface ipv6 show addresses command on my Windows 10 machine I see that they have a duration of an hour after which they are presumably regenerated.

The netsh interface ipv6 show privacy command shows whether temporary IPv6 addresses are enabled or not. Linux has something similar.

Sure enough when I now visit https://www.whatismyip.com/ on my browser it no longer shows the DHCP assigned IPv6 address but one of the temporary ones (and no, it does not even show the SLAAC generated IPv6 address based on the EUI-64 MAC address; it’s a temporary random address that appears in ipconfig or netsh interface ipv6 show addresses as temporary).

 

Brief note on IPv6 flags and Dnsmasq modes

Discovered that my Android phone only had a link-local IPv6 address and learnt that it doesn’t support DHCPv6 (who thought?!). So I want to enable SLAAC in addition to DHCPv6 in my network. Was checking out Dnsmaq options (as Asus uses that) and came across its various modes.

IPv6 Router Advertisement (RA) messages can contain the following flags:

  • M (“Managed address configuration”) – indicates that IPv6 addresses are available via DHCPv6. This is also referred to as Stateful DHCP.
  • O (“Other configuration”) – no IPv6 address, but other configuration information like DNS etc. are available via DHCPv6. This is also referred to as Stateless DHCP.
  • A (“Autonomous Address Configuration”) – indicates that the prefix present with the flag can be used for SLAAC (StateLess Auto Address Configuration).

Note that if the M flag is present the O flag doesn’t matter – coz clients are getting information via DHCPv6 anyway.

Dnsmasq allows the following modes when defining an IPv6 range (from its man page):

For IPv6, the mode may be some combination of ra-only, slaac, ra-names, ra-stateless, ra-advrouter, off-link.

ra-only tells dnsmasq to offer Router Advertisement only on this subnet, and not DHCP.

slaac tells dnsmasq to offer Router Advertisement on this subnet and to set the A bit in the router advertisement, so that the client will use SLAAC addresses. When used with a DHCP range or static DHCP address this results in the client having both a DHCP-assigned and a SLAAC address.

ra-stateless sends router advertisements with the O and A bits set, and provides a stateless DHCP service. The client will use a SLAAC address, and use DHCP for other configuration information.

ra-names enables a mode which gives DNS names to dual-stack hosts which do SLAAC for IPv6. Dnsmasq uses the host’s IPv4 lease to derive the name, network segment and MAC address and assumes that the host will also have an IPv6 address calculated using the SLAAC algorithm, on the same network segment. The address is pinged, and if a reply is received, an AAAA record is added to the DNS for this IPv6 address. Note that this is only happens for directly-connected networks, (not one doing DHCP via a relay) and it will not work if a host is using privacy extensions. ra-names can be combined with ra-stateless and slaac.

ra-advrouter enables a mode where router address(es) rather than prefix(es) are included in the advertisements. This is described in RFC-3775 section 7.2 and is used in mobile IPv6. In this mode the interval option is also included, as described in RFC-3775 section 7.3.

off-link tells dnsmasq to advertise the prefix without the on-link (aka L) bit set.

This is a bit confusing so thought I should put it into a nice table. Note that this is my understanding, I could be wrong:

ra-only

no M or O flags; only A flag

clients can use the RA to configure their SLAAC IPv6 address. no DHCPv6 is offered.
slaac

if a DHCPv6 range is specified then M and A flags; else only A flag. no O flag, but as I said above the O flag doesn’t matter anyways if M flag is present.

I’d say M and A flags always (see my point in the next column)

clients can use RA to configure their SLAAC address. DHCPv6 too is offered if a range is configured. thus clients can have two IPv6 addresses – a SLAAC one and a DHCPv6 one.

slaac sounds like ra-only if no DHCP range is configured. I wonder why the DHCP range is presented as it is an optional thing. The DHCP range is what makes slaac different from ra-only, so you kind of actually need it.

ra-stateless only O and A flags; no M flag clients can use RA to configure their SLAAC address and look to DHCPv6 for the DNS etc. information.
ra-names no M or O flags; only A flag this one didn’t make much sense to me; but then again it is meant for dual stacked clients and I am not looking at that scenario. it sounds like ra-only, the difference being that Dnsmasq will assume the client’s SLAAC IPv6 address is based on its MAC address and thus derive a possible IPv6 address and ping it and if there’s a reply then create an AAAA record mapping the client’s name to this SLAAC IPv6 address.
ra-names,slaac M and A flags (assuming it is same as the slaac mode) same as above, just that clients will have a DHCPv6 address in addition to the SLAAC one. and Dnsmasq will create the AAAA DNS record.
ra-names,ra-stateless O and A flags; no M flag same as above, just that clients don’t have any DHCPv6 address but use RA to configure DNS etc.
ra-advrouter ignoring it for now – it’s to do with mobile IPv6 and didn’t make much sense to me :)  
off-link ignoring for now; didn’t make much sense to me  

So in my case it looks like I have to enable the slaac mode. This way all my clients will have both DHCPv6 and SLAAC addresses (with the exception of Android who will get the SLAAC address only).

IPv6 at home!

Whee! I enabled IPv6 at home today. :)

It’s pretty straight-forward so not really an accomplishment on my part actually. I didn’t really have to do anything except flip a switch, but I am glad I thought of doing it and actually did it, and pretty happy to see that it works. Nice!

Turns out Etislalat started rolling out IPv6 to home users in Dubai back in November 2016. I obviously didn’t know of it. Nice work Etisalat!

Also, my Asus router supports IPv6. Windows and iOS etc. supports IPv6 too, so all the pieces are really in place.

All I had to do on the Asus router was go to the IPv6 section, set Connection Type as “Native”, Interface as “PPP”, enable “DHCP-PD” and enable “Release prefix on exit”. DHCP-PD stands for “DHCP Prefix Delegation”. In IPv4 the ISP gives your home router a single public IP and everything behind the home router is NAT’d into that single pubic IP by the router. In IPv6 you are not limited to a single public IP. IPv6 has tons of addresses after all, so every device can have a pubic IP. Thus the ISP gives you not a single IPv6 address, but a /64 publicly accessible prefix itself and all your home devices can take addresses from that pool. Thus “DHCP-PD” means your router asks the ISP to give it a prefix, and “Release prefix on exit” means the router gives that prefix back to the ISP when disconnecting or whatever.

I also decided to use the Google DNS IPv6 servers.

Here’s a list of IPv6 only websites if you want to visit and feel good. :p

Check out this website to test IPv6. It also has a dual stack version that checks if your browser prefers IPv4 over IPv6 even though it may have IPv6 connectivity. Initially I was using this test site. The test succeeded there but I got the following error: “Your browser has real working IPv6 address – but is avoiding using it. We’re concerned about this.”. Turns out Chrome and Firefox start an internal counter when a site has an IPv6 and IPv4 address and if the IPv4 address responds faster then they prefer the IPv4 version. Crazy huh! In Firefox I found these two options in about:config and that seemed to fix this – network.http.fast-fallback-to-IPv4 (set this to false) and network.notify.IPv6 (set to true – I am not sure this setting matters for my scenario but I changed it anyways).

Here’s Comcast’s version of SpeedTest over IPv6.

Back to my router settings. I decided to go with “Stateful” auto configuration for the IPv6 LAN and set an appropriate range. With IPv6 you can have the router dole out IPv6 addresses to clients (in the prefix it has) or you have have clients auto configure their IPv6 address by asking the router for the prefix information but creating their own address based on that. The former is “Stateful”, the latter is “Stateless”. I decided to go with “Stateful” (though I did play around with “Stateless” too). Also, leave the “Router Advertisements” section Enabled.

That’s pretty much it.

In my case I ended up wasting about an hour after this as I noticed that my Windows 10 laptop would work on IPv6 for a while and then stop working. It wasn’t able to ping the router either. After a lot of trial and error and fooling around I realized that it’s because a long time ago I had disabled a lot of firewall rules on my Windows 10 laptop and in the process dis-allowed my IPv6 rules that were enabled by default. Silly of me! I changed all those to their default state and now the laptop works fine without an issue.

Before moving on – double check that the IPv6 firewall on your router is enabled. Now that every machine in your LAN (that has an IPv6 address) is publicly accessible one has to be careful.

Etisalat and 3rd party routers

I shifted houses recently and rather than shift my Internet connection (as that has a 4 days downtime) I decided to apply for a new connection at the new premises (had an offer going on wherein the installation charge is zero) and then disconnect the existing connection once I have shifted. A downside of this – which I later realized – is that Etisalat seems to have stopped giving customers the Internet password.

Turns out Etisalat (like many other ISPs) now autoconfigure their routers. You simply plug it into the network and it contacts Etisalat’s servers and configures itself. This is using a protocol called TR-069, which I don’t know much of, but it seems to have some security risks. I have an Asus RT-AC68U router anyways which I have setup the way I want, so I wanted to move over from the Etisalat D-Link router to this one. When I spoke to the chap who installed my new Internet connection he said Etisalat does not allow users to install their own routers apparently. Found many Reddit posts too where people have complained of having to contact Etisalat and not been given this password and also about having to set a VLAN etc (e.g. this post). Seemed to be a lot of trouble.

Anyhow, I decided to try my luck. First I contacted them via email (care -at- etisalat.ae) asking to reset my password. A helpful agent called me up after a while and reset the password for it. It didn’t even affect my Internet connection coz the auto-configuring ensured that the Etisalat router picked up the new info. So far so good. I tried using these details with the Asus router to see if it will work straightaway, but it didn’t. So I sent them another email asking for the VLAN details. Next day another chap called me up and gave the VLAN details. He also mentioned that I’ll have to leave PnP on in my Asus router, or else he can raise a ticket to disable it. I said I’d like to have it disabled. About 4 hours later someone else called me up and said they are going to disable it now and would I like any assistance etc. I said nope, I’ll take care of it on my own.

Once they disabled PnP the Etisalat router stopped working. So I swapped it with the Asus one, and set the VLAN to what they agent gave me (it’s under LAN > IPTV Settings confusingly). I also changed the MAC of the Asus router to that of the Etisalat one – though I am not sure if that was really needed (I just did it beforehand, before unplugging the Etisalat router). This didn’t get things working though. Which stumped me for a while, until on a whim I decided to remove the VLAN stuff and just try with the username password like I had done yesterday. And yay that worked! So it wasn’t too much of a hassle after all. The phone and TV (eLife) still seem to be working so looks like I didn’t break anything either.

So, to summarize. If you want to use your own router with Etisalat (new connections) send them an email asking for the password to be reset and also make changes such as disabling Plug & Play so you can use your own router. Ask for the VLAN too just in case. Once you get these details connect the new router and put in the username password. If that doesn’t work put in the VLAN info too. That’s all! I was pleased with the quick turnaround and support, and it didn’t turn out to be a hassle at all like I was expecting. Nice one! :)

Couple of DNS stuff

So CloudFlare announced the 1.1.1.1 DNS resolver service the other day. Funny, coz I had been looking into various DNS options for my home network recently. What I had noticed at home was that when I use the Google DNS or OpenDNS resolvers I get a different (and much closer!) result for google.com while with other DNS servers (e.g. Quad9, Yandex) I get a server that’s farther away.

I was aware that using 3rd party DNS resolvers like this could result in me getting not ideal results, because the name server of the service I am querying would see my queries coming from this 3rd party resolver and hence give me a result from the region of this resolver (e.g. if Google.com has servers in UAE and US, and I am based in UAE, Google.com’s name servers will see that the request from www.google.com is coming from a server in the US and hence give me a result from the US thinking that’s where I am located). But that didn’t explain why Google DNS and OpenDNS were actually giving me good results.

Reading about that I came across this performance page from the Google DNS team and learnt about the edns-client-subnet (ECS) option (also see this FAQ entry). This is an option that name servers can support wherein the client can send over its IP/ subnet along with the query and the name server will look at that and modify its response accordingly. And if the DNS resolver support this, then it can send along this info to the name servers being queried and thus get better results. Turns out only Google DNS and OpenDNS support this and Google actually queries the name servers it knows with ECS queries and caches the results to keep track of which name servers support ECS. This way it can send those servers the ECS option. That’s pretty cool, and a good reason to stick with Google DNS! (I don’t think CloudFlare DNS currently does this, because I get non-ideal results with it too).

From this “how it works” page:

Today, if you’re using OpenDNS or Google Public DNS and visiting a website or using a service provided by one of the participating networks or CDNs in the Global Internet Speedup then a truncated version of your IP address will be added into the DNS request. The Internet service or CDN will use this truncated IP address to make a more informed decision in how it responds so that you can be connected to the most optimal server. With this more intelligent routing, customers will have a better Internet experience with lower latency and faster speeds. Best of all, this integration is being done using an open standard that is available for any company to integrate into their own platform.

While on DNS, I came across DNS Perf via the CloudFlare announcement. Didn’t know of such a service. Also useful, in case you didn’t know already, is this GRC tool.

Lastly, I came across Pi-Hole recently and that’s what I use at home nowadays. It’s an advertisement black hole. Got a good UI and all. It uses DNS (all clients point to the local Pi-Hole install for DNS) and is able to block advertisements and malware this way.

ADFS monitoring on NSX

Was looking at setting up monitoring of my ADFS servers on NSX.

I know what to monitor on the ADFS and WAP servers thanks to this article.

http://<Web Application Proxy name>/adfs/probe
http://<ADFS server name>/adfs/probe
http://<Web Application Proxy IP address>/adfs/probe
http://<ADFS IP address>/adfs/probe

Need to get an HTTP 200 response for these.

So I created a service monitor in NSX along these lines:

And I associated it with my pool:

Bear in mind the monitor has to check port 80, even though my pool might be on port 443, so be sure to change the monitor port as above.

The “Show Pool Statistics” link on the “Pools” section quickly tells us whether the member servers are up or not:

The show service loadbalancer pool command can be used to see what the issue is in case the monitor appears down. Here’s an example when things aren’t working:

Here’s an example when all is well:

Thanks to this document for pointing me in the right troubleshooting direction. Quoting from that document, the list of error codes:

UNK: Unknown

INI: Initializing

SOCKERR: Socket error

L4OK: Check passed on layer 4, no upper layers testing enabled

L4TOUT: Layer 1-4 timeout

L4CON: Layer 1-4 connection problem. For example, “Connection refused” (tcp rst) or “No route to host” (icmp)

L6OK: Check passed on layer 6

L6TOUT: Layer 6 (SSL) timeout

L6RSP: Layer 6 invalid response – protocol error. May caused as the:

Backend server only supports “SSLv3” or “TLSv1.0”, or

Certificate of the backend server is invalid, or

The cipher negotiation failed, and so on

L7OK: Check passed on layer 7

L7OKC: Check conditionally passed on layer 7. For example, 404 with disable-on-404

L7TOUT: Layer 7 (HTTP/SMTP) timeout

L7RSP: Layer 7 invalid response – protocol error

L7STS: Layer 7 response error. For example, HTTP 5xx

Nice!

Quick note to self on NSX Load Balancing

Inline mode == Transparent mode (the latter is the terminology in the UI).

In this mode the load balancer is usually the default gateway for the servers it load balances. Traffic comes to the load balancer, it sends to the appropriate server (after changing the destination IP of the packet – hence DNAT), and replies come to it as it is the default gateway for the server. Note that as far as the destination server is concerned the source IP address is not the load balancer but the client who made the request. Thus the destination server knows who is making the request.

When the load balancer replies to the client who made the request it changes the source IP of the reply from the selected server to its own IP (hence SNAT when replying only).

One-Armed mode == Proxy mode

In this mode the load balancer is not the default gateway. The servers it load balance don’t have any changes required to be made to them. The load balancer does a DNAT as before, but also changes the source IP to be itself rather than the client (hence SNAT). When the selected server replies this time, it thinks the source is the load balancer and so replies to it rather than the client. Thus there’s no changes required on the server side. Because of this though, the server doesn’t know who made the request. All requests appear to come from the load balancer (unless you use some headers to capture the info).

As before, when the load balancer replies to the client who made the request it changes the source IP of the reply from the selected server to its own IP (hence SNAT when replying too).

You set the inline/ transparent vs. one-armed/ proxy mode per pool.

To have load balancing in NSX you need to deploy an ESG (Edge Services Gateway). I don’t know why, but I always associated an ESG with just external routing so it took me by surprise (and still does) when I think I need to deploy an ESG for load balancing, DHCP, and other edge- sort of services (VPN, routing, etc). I guess the point to remember is that it’s not just a gateway – it’s an edge services gateway. :)

Anyways, feel free to deploy as many ESGs as you feel like. You can have one huge ESG that takes care of all your load balancing needs, or you can have multiple small ones and hand over control to the responsible teams.

This is a good starting point doc from VMware.

You can have L4 and L7 load balancing. If you need only L4 (i.e. TCP, UDP, port number) the UI calls it acceleration. It’s a global configuration, on the ESG instance itself, so bear that in mind.

If you enable acceleration on an ESG, you have to also enable it per virtual server.

L4 load balancing is packet based (obviously, coz it doesn’t need to worry about the application as such). L7 load balancing is socket based. Quoting from this doc (highlight mine):

Packet-based load balancing is implemented on the TCP and UDP layer. Packet-based load balancing does not stop the connection or buffer the whole request, it sends the packet directly to the selected server after manipulating the packet. TCP and UDP sessions are maintained in the load balancer so that packets for a single session are directed to the same server. You can select Acceleration Enable in both the global configuration and relevant virtual server configuration to enable packet-based load balancing.

Socket-based load balancing is implemented on top of the socket interface. Two connections are established for a single request, a client-facing connection and a server-facing connection. The server-facing connection is established after server selection. For HTTP socket-based implementation, the whole request is received before sending to the selected server with optional L7 manipulation. For HTTPS socket-based implementation, authentication information is exchanged either on the client-facing connection or on the server-facing connection. Socket-based load balancing is the default mode for TCP, HTTP, and HTTPS virtual servers.

Also worth noting:

The L4 VIP (“acceleration enabled” in the VIP configuration and no L7 setting such as AppProfile with cookie persistence or SSL-Offload) is processed before the edge firewall, and no edge firewall rule is required to reach the VIP. However, if the VIP is using a pool in non-transparent mode, the edge firewall must be enabled (to allow the auto-created SNAT rule).

The L7 HTTP/HTTPS VIPs (“acceleration disabled” or L7 setting such as AppProfile with cookie persistence or SSL-Offload) are processed after the edge firewall, and require an edge firewall allow rule to reach the VIP.

Application Profiles define common application behaviors client SSL, server SSL, x-forwarded-for, and persistence. These can be reused across virtual server and is mandatory when defining a virtual server. This is also where you can do HTTP redirects.

NSX Firewall no working on Layer3; OpenBSD VMware Tools; IP Discovery, etc.

I have two security groups. Network 1 VMs (a group that contains my VMs in the 192.168.1.0/24) and Network 2 VMs (similar, for 192.168.2.0/24 network). 

Both are dynamic groups. I select members based on whether the VM name contains -n1 or -n2. (The whole exercise is just for fun/ getting to know this stuff). 

I have two firewall rules making use of these rules. Layer 2 and Layer 3. 

The Layer 2 rule works but the Layer 3 one does not! Weird. 

I decided to troubleshoot this via the command line. Figured it would be a good opportunity.

To troubleshoot I have to check the rules on the hosts (because remember, that’s where the firewall is; it’s a kernel module in each host). For that I need to get the host-id. For which I need to get the cluster-id. Sadly there’s no command to list all hosts (or at least I don’t know of any). 

So now I have my host-ids.

Let’s also take a look the my VMs (thankfully it’s a short list! I wonder how admins do this in real life):

We can see the filters applying to each VM.  To summarize:

And are these filters applying on the hosts themselves?

Hmm, that too looks fine. 

Next I picked up one of the rule sets and explored it further:

The Layer 3 & Layer 2 rules are in separate rule sets. I have marked the ones which I am interested in. One works, the other doesn’t. So I checked the address sets used by both:

Tada! And there we have the problem. The address set for the Layer 3 rule is empty. 

I checked this for the other rules too – same situation. I modified my Layer 3 rule to specifically target the subnets:

And the address set for that rule is not empty:

And because of this the firewall rules do work as expected. Hmm.

I modified this rule to be a group with my OpenBSD VMs from each network explicitly added to it (i.e. not dynamic membership in case that was causing an issue). But nope, same result – empty address set!

But the address set is now empty. :o)

So now I have an idea of the problem. I am not too surprised by this because I vaguely remember reading something about VMware Tools and IP detection inside a VM (i.e. NSX makes use of VMware Tools to know the IP address of a VM) and also because I am aware OpenBSD does not use the official VMware Tools package (it has its own and that only provides a subset of functions).

Googling a bit on this topic I came across the IP address Discovery section in the NSX Admin guide – prior to NSX 6.2 if VMware Tools wasn’t installed (or was stopped) NSX won’t be able to detect the IP address of the VM. Post NSX 6.2 it can do DHCP & ARP snooping to work around a missing/ stopped VMware Tools. We configure the latter in the host installation page:

I am going to go ahead and enable both on all my clusters. 

That helped. But it needs time. Initially the address set was empty. I started pings from one VM to another and the source VM IP was discovered and put in the address set; but since the destination VM wasn’t in the list traffic was still being allowed. I stopped pings, started pings, waited a while … tried again … and by then the second VM IP to was discovered and put in the address set – effectively blocking communication between them. 

Side by side I installed a Windows 8.1 VM with VMware Tools etc and tested to see if it was being automatically picked up (I did this before enabling the snooping above). It was. In fact its IPv6 address too was discovered via VMware Tools and added to the list:

Nice! Picked up something interesting today. 

Notes to self while installing NSX 6.3 (part 4)

Reading through the VMware NSX 6.3 Install Guide after having installed the DLR and ESG in my home lab. Continuing from the DLR section.

As I had mentioned earlier NSX provides routing via DLR or ESG.  

  • DLR == Distributed Logical Router.
  • ESG == Edge Services Gateway

DLR consists of an appliance that provides the control plane functionality. This appliance does not do any routing itself. The actual routing is done by the VIBs on the ESXi hosts. The appliance uses the NSX Controller to push out updates to the ESXi host. (Note: Only DLR. ESG does not depend on the Controller to push out route). Couple of points to keep in mind:

  • A DLR instance cannot connect to logical switches in different transport zones. 
  • A DLR cannot connect to a dvPortgroup with VLAN ID 0.
  • A DLR cannot connect to a dvPortgroup with VLAN ID if that DLR also connects to logical switches spanning more than one VDS. 
    • This confused me. Why would a logical switch span more than one VDS? I dunno. There are reasons probably, same way you could have multiple clusters in same data center having different VDSes instead of using the same one. 
  • If you have portgroups on different VDSes with the same VLAN ID, and these VDSes share some hosts, then DLR cannot connect these. 

I am not entirely clear with the above points. It’s more to enforce the transport zones and logical switches align correctly, but I haven’t entirely understood it so I am simply going to make note as above and move on …

In a DLR the firewall rules only apply to the uplink interface and are limited to traffic destined for the edge virtual appliance. In other words they don’t apply to traffic between the logical switches a DLR instance connects. (Note that this refers to the firwall settings found under the DLR section, not in the Firewall section of NSX). 

A DLR has many interfaces. The one exposed to VMs for routing is the Logical InterFace (LIF). Here’s a screenshot from the interfaces on my DLR. 

The ones of type ‘Internal’ are the LIFs. These are the interfaces that the DLR will route between. Each LIF connects to a separate network – in my case a logical switch each. The IP address assigned to this LIF will be the address you set as gateway for the devices in that network. So for example: one of the LIFs has an IP address 192.168.1.253 and connects to my 192.168.1.0/24 segment. All the VMs there will have 192.168.1.253 as their default gateway. Suppose we ignore the ‘Uplink’ interface for now (it’s optional, I created it for the external routing to work), and all our DLR had were the two ‘Internal’ LIFs, and VMs on each side had the respective IP address set as their default gateway, then our DLR will enable routing between these two networks. 

Unlike a physical router though, which exists outside the virtual network and which you can point to as “here’s my router”, there’s no such concept with DLRs. The DLR isn’t a VM which you can point to as your router. Nor is it a VM to which packets between these networks (logical switches) are sent to for routing. The DLR, as mentioned above, is simply your ESXi hosts. Each ESXi host that has logical switches which a DLR connects into has this LIF created in them with that LIF IP address assigned to it and a virtual MAC so VMs can send packets to it. The DLR is your ESXi host. (That is pretty cool, isn’t it! I shouldn’t be amazed because I had mentioned it earlier when reading about all this, but it is still cool to actually “see” it once I have implemented).

Above screenshot is from my two VMs on the same VXLAN but on different hosts. Note that the default gateway (192.168.1.253) MAC is the same for both. Each of their hosts will respond to this MAC entry. 

(Note to self: Need to explore the net-vdr command sometime. Came across it as I was Googling on how to find the MAC address table seen by the LIF on a host. Didn’t want to get side-tracked so didn’t explore too much. There’s something called a VDR (not encountered it yet in my readings).

  • net-vdr -I -l will list all the VDRs on a host.
  • net-vdr -L -l <vdrname> will list the LIFs.
  • net-vdr -N -l <vdrname> will list the MAC addresses (ARP info)

)

When creating a DLR it is possible to create it with or without the appliance. Remember that the appliance provides the control plane functionality. It is the appliance that learns of new routes etc and pushes to the DLR modules in the ESXi hosts. Without an appliance the DLR modules will do static routing (which might be more than enough, especially in a test environment like my nested lab for instance) so it is ok to skip it if your requirements are such. Adding an appliance means you get to (a) select if it is deployed in HA config (i.e. two appliance), (b) their locations etc, (c) IP address and such for the appliance, as well as enabling SSH. The appliance is connected to a different interface for HA and SSH – this is independent of the LIFs or Uplink interfaces. That interface isn’t used for any routing. 

Apart from the control plane, the appliance also controls the firewall on the DLR. If there’s no appliance you can’t make any firewall changes to the DLR – makes sense coz there’s nothing to change. You won’t be connecting to the DLR for SSH or anything coz you do that to the appliance on the HA interface. 

According to the docs you can’t add an appliance once a DLR instance is deployed. Not sure about that as I do see an option to deploy an appliance on my non-appliance DLR instance. Maybe it will fail when I actually try and create the appliance – I didn’t bother trying. 

Discovered this blog post while Googling for something. I’ve encountered & linked to his posts previously too. He has a lot of screenshots and step by step instructions. So worth a check out if you want to see some screenshots and much better explanation than me. :) Came across some commands from his blog which can be run on the NSX Controller to see the DLRs it is aware of and their interfaces. Pasting the output from my lab here for now, I will have to explore this later …

I have two DLRs. One has an appliance, other doesn’t. I made these two, and a bunch of logical switches to hook these to, to see if there’s any difference in functionality or options.

One thing I realized as part of this exercise is that a particular logical switch can only connect to one DLR. Initially I had one DLR which connected to 192.168.1.0/24 and 192.168.2.0/24. Its uplink was on logical switch 192.168.0.0/24 which is where the ESG too hooked into. Later when I made one more DLR with its own internal links and tried to connect its uplink to the 192.168.0.0/24 network used by the previous DLR, I saw that it didn’t even appear in the list of options. That’s when I realized its better to use a smaller range logical switch for the uplinks – like say a /30 network. This way each DLR instance connects to an ESG on its own /30 network logical switch (as in the output above). 

A DLR can have up to 8 uplink interfaces and 1000 internal interfaces.


Moving on to ESG. This is a virtual appliance. While a DLR provides East-West routing (i.e. within the virtual environment), an ESG provides North-South routing (i.e. out of the virtual environment). The ESG also provides services such as DHCP, NAT, VPN, and Load Balancing. (Note to self: DLR does not provide DHCP or Load Balancing as one might expect (at least I did! :p). DLR provides DHCP Relay though). 

The uplink of an ESG will be a VDS (Distributed Switch) as that’s what eventually connects an ESXi environment to the physical network. 

An ESG needs an appliance to be deployed. You can enable/ disable SSH into this appliance. If enabled you can SSH into the ESG appliance from the uplink address or from any of the internal link IP addresses. In contrast, you can only SSH into a DLR instance if it has an associated appliance. Even then, you cannot SSH into the appliance from the internal LIFs (coz these don’t really exist, remember … they are on each ESXi host). With a DLR we have to SSH into the interface used for HA (this can be used even if there’s only one appliance and hence no HA). 

When deploying an ESG appliance HA can be enabled. This deploys two appliances in an active/passive mode (and the two appliances will be on separate hosts). These two appliances will talk to each other to keep in sync via one of the internal interfaces (we can specify one, or NSX will just choose any). On this internal interface the appliances will have a link local IP address (a /30 subnet from 169.254.0.0/16) and communicate over that (doesn’t matter that there’s some other IP range actually used in that segment, as these are link local addresses and unlikely anyone’s going to actually use them). In contrast, if a DLR appliance is deployed with HA we need to specify a separate network from the networks that it be routing between. This can be a logical switch or a DVS, and as with ESG the two appliances will have link local IP addresses (a /30 subnet from 169.254.0.0/16) for communication. Optionally, we can specify an IP address in this network via which we can SSH into the DLR appliance (this IP address will not be used for HA, however).

After setting up all this, I also created two NAT rules just for kicks. 

And with that my basic setup of NSX is complete! (I skipped OSPF as I don’t think I will be using it any time soon in my immediate line of work; and if I ever need to I can come back to it later). Next I need to explore firewalls (micro-segmentation) and possibly load balancing etc … and generally fiddle around with this stuff. I’ve also got to start figuring out the troubleshooting and command-line stuff. But the base is done – I hope!

Yay! (VXLAN) contd. + Notes to self while installing NSX 6.3 (part 3)

Finally continuing with my NSX adventures … some two weeks have past since my last post. During this time I moved everything from VMware Workstation to ESXi. 

Initially I tried doing a lift and shift from Workstation to ESXi. Actually, initially I went with ESXi 6.5 and that kept crashing. Then I learnt it’s because I was using the HPE customized version of ESXi 6.5 and since the server model I was using isn’t supported by ESXi 6.5 it has a tendency to PSOD. But strangely the non-HPE customized version has no issues. But after trying the HPE version and failing a couple of times, I gave up and went to ESXi 5.5. Set it up, tried exporting from VMware Workstation to ESXi 5.5, and that failed as the VM hardware level on Workstation was newer than ESXi. 

Not an issue – I fired up VMware Converter and converted each VM from Workstation to ESXi. 

Then I thought hmm, maybe the MAC addresses will change and that will cause an issue, so I SSH’ed into the ESXi host and manually changed the MAC addresses of all my VMs to whatever it was in Workstation. Also changed the adapters to VMXNet3 wherever it wasn’t. Reloaded the VMs in ESXi, created all the networks (portgroups) etc, hooked up the VMs to these, and fired them up. That failed coz the MAC address ranges were of VMware Workstation and ESXi refuses to work with those! *grr* Not a problem – change the config files again to add a parameter asking ESXi to ignore this MAC address problem – and finally it all loaded. 

But all my Windows VMs had their adapters reset to a default state. Not sure why – maybe the drivers are different? I don’t know. I had to reconfigure all of them again. Then I turned to OpnSense – that too had reset all its network settings, so I had to configure those too – and finally to nested ESXi hosts. For whatever reason none of them were reachable; and worse, my vCenter VM was just a pain in the a$$. The web client kept throwing some errors and simply refused to open. 

That was the final straw. So in frustration I deleted it all and decided to give up.

But then …

I decided to start afresh. 

Installed ESXi 6.5 (the VMware version, non-HPE) on the host. Created a bunch of nested ESXi VMs in that from scratch. Added a Windows Server 2012R2 as the shared iSCSI storage and router. Created all the switches and port groups etc, hooked them up. Ran into some funny business with the Windows Firewall (I wanted to assign some interface as Private, others as Public, and enable firewall only only the Public ones – but after each reboot Windows kept resetting this). So I added OpnSense into the mix as my DMZ firewall.

So essentially you have my ESXi host -> which hooks into an internal vSwitch portgroup that has the OpnSense VM -> which hooks into another vSwitch portgroup where my Server 2012R2 is connected to, and that in turn connects to another vSwitch portgroup (a couple of them actually) where my ESXi hosts are connected to (need a couple of portgroup as my ESXi hosts have to be in separate L3 networks so I can actually see a benefit of VXLANs). OpnSense provides NAT and firewalling so none of my VMs are exposed from the outside network, yet they can connect to the outside network if needed. (I really love OpnSense by the way! An amazing product). 

Then I got to the task of setting these all up. Create the clusters, shared storage, DVS networks, install my OpenBSD VMs inside these nested EXSi hosts. Then install NSX Manager, deploy controllers, configure the ESXi hosts for NSX, setup VXLANs, segment IDs, transport zones, and finally create the Logical Switches! :) I was pissed off initially at having to do all this again, but on the whole it was good as I am now comfortable setting these up. Practice makes perfect, and doing this all again was like revision. Ran into problems at each step – small niggles, but it was frustrating. Along the way I found that my (virtual) network still does not seem to support large MTU sizes – but then I realized it’s coz my Server 2012R2 VM (which is the router) wasn’t setup with the large MTU size. Changed that, and that took care of the MTU issue. Now both Web UI and CLI tests for VXLAN succeed. Finally!

Third time lucky hopefully. Above are my two OpenBSD VMs on the same VXLAN, able to ping each other. They are actually on separate L3 ESXi hosts so without NSX they won’t be able to see each other. 

Not sure why there are duplicate packets being received. 

Next I went ahead and set up a DLR so there’s communicate between VXLANs. 

Yeah baby! :o)

Finally I spent some time setting up an ESG and got these OpenBSD VMs talking to my external network (and vice versa). 

The two command prompt windows are my Server 2012R2 on the LAN. It is able to ping the OpenBSD VMs and vice versa. This took a bit more time – not on the NSX side – as I forgot to add the routing info on the ESG for my two internal networks (192.168.1.0/24 and 192.168.2.0/24) as well on the Server 2012R2 (192.168.0.0/16). Once I did that routing worked as above. 

I am aware this is more of a screenshots plus talking post rather than any techie details, but I wanted to post this here as a record for myself. I finally got this working! Yay! Now to read the docs and see what I missed out and what I can customize. Time to break some stuff finally (intentionally). 

:o)

Yay! (VXLAN) contd. + Notes to self while installing NSX 6.3 (part 2)

In my previous post I said the following (in gray). Here I’d like to add on:

  • A VDS uses VMKernel ports (vmk ports) to carry out the actual traffic. These are virtual ports bound to the physical NICs on an ESXi host, and there can be multiple vmk ports per VDS for various tasks (vMotion, FT, etc). Similar to this we need to create a new vmk port for the host to connect into the VTEP used by the VXLAN. 
    • Unlike regular vmk ports though we don’t create and assign IP addresses manually. Instead we either use DHCP or create an IP pool when configuring the VXLAN for a cluster. (It is possible to specify a static IP either via DHCP reservation or as mentioned in the install guide).
      • The number of vmk ports (and hence IP addresses) corresponds to the number of uplinks. So a host with 2 uplinks will have two VTEP vmk ports, hence two IP addresses taken from the pool. Bear that in mind when creating the pool.
    • Each cluster uses one VDS for its VXLAN traffic. This can be a pre-existing VDS – there’s nothing special about it just that you point to it when enabling VXLAN on a cluster; and the vmk port is created on this VDS. NSX automatically creates another portgroup, which is where the vmk port is assigned to.
    • VXLANs are created on this VDS – they are basically portgroups in the VDS. Each VXLAN has an ID – the VXLAN Network Identifier (VNI) – which NSX refers to as segment IDs. 
      • Before creating VXLANS we have to allocate a pool of segment IDs (the VNIs) taking into account any VNIs that may already be in use in the environment.
      • The number of segment IDs is also limited by the fact that a single vCenter only supports a maximum of 10,000 portgroups
      • The web UI only allows us to configure a single segment ID range, but multiple ranges can be configured via the NSX API
  • Logical Switch == VXLAN -> which has an ID (called segment ID or VNI) == Portgroup. All of this is in a VDS. 

While installing NSX I came across “Transport Zones”.

Remember ESXi hosts are part of a VDS. VXLANs are created on a VDS. Each VXLAN is a portgroup on this VDS. However, not all hosts need be part of the same VXLANs, but since all hosts are part of the same VDS and hence have visibility to all the VXLANs we need same way of marking which hosts are part of a VXLAN. We also need some place to identify if a VXLAN is in unicast, multicast, or hybrid mode. This is where Transport Zones come in.

If all your VXLANs are going to behave the same way (multicast etc) and have the same hosts, then you just need one transport zone. Else you would create separate zones based on your requirement. (That said, when you create a Logical Switch/ VXLAN you have an option to specify the control plane mode (multicast mode etc). Am guessing that overrides the zone setting, so you don’t need to create separate zones just to specify different modes). 

Note: I keep saying hosts above (last two paragraphs) but that’s not correct. It’s actually clusters. I keep forgetting, so thought I should note it separately here rather the correct my mistake above. 1) VXLANs are configured on clusters, not hosts. 2) All hosts within a cluster must be connected to a common VDS (at least one common VDS, for VXLAN purposes). 3) NSX Controllers are optional and can be skipped if you are using multicast replication? 4) Transport Zones are made up of clusters (i.e. all hosts in a cluster; you cannot pick & choose just some hosts – this makes sense when you think that a cluster is for HA and DRS so naturally you wouldn’t want to exclude some hosts from where a VM can vMotion to as this would make things difficult). 

Worth keeping in mind: 1) A cluster can belong to multiple transport zones. 2) A logical switch can belong to only one transport zone. 3) A VM cannot be connected to logical switches in different transport zones. 4) A DLR (Distributed Logical Router) cannot connect to logical switches in multiple transport zones. Ditto for an ESG (Edge Services Gateway). 

After creating a transport zone, we can create a Logical Switch. This assigns a segment ID from the pool automatically and this (finally!!) is your VXLAN. Each logical switch creates yet another portgroup. Once you create a logical switch you can assign VMs to it – that basically changes their port group to the one created by the logical switch. Now your VMs will have connectivity to each other even if they are on hosts in separate L3 networks. 

Something I hadn’t realized: 1) Logical Switches are created on Transport Zones. 2) Transport Zones are made up of / can span clusters. 3) Within a cluster the logical switches (VXLANs) are created on the VDS that’s common to the cluster. 4) What I hadn’t realized was this: no where in the previous statements is it implied that transport zones are limited to a single VDS. So if a transport zone is made up of multiple clusters, each / some of which have their own common VDS, any logical switch I create will be created on all these VDSes.  

Sadly, I don’t feel like saying yay at the this point unlike before. I am too tired. :(

Which also brings me to the question of how I got this working with VMware Workstation. 

By default VMware Workstation emulates an e1000 NIC in the VMs and this doesn’t support an MTU larger than 1500 bytes. We can edit the .VMX file of a VM and replace “e1000” with “vmxnet3” to replace the emulated Intel 82545EM Gigabit Etherne NIC with a paravirtual VMXNET3 NIC to the VMs. This NIC supports an MTU larger than 1500 bytes and VXLAN will begin working. One thing though: a quick way of testing if the VTEP VMkernel NICs are able to talk to each other with a larger MTU is via a command such as ping ++netstack=vxlan -I vmk3 -d -s 1600 xxx.xxx.xxx.xxx. If you do this once you add a VMXNET3 NIC though, it crashes the ESXi host. I don’t know why. It only crashes when using the VXLAN network stack; the same command with any other VMkernel NIC works fine (so I know the MTU part is ok). Also, when testing the Logical Switch connectivity via the Web UI (see example here) there’s no crash with a VXLAN standard test packet – maybe that doesn’t use the VXLAN network stack? I spent a fair bit of time chasing after the ping ++netstack command until I realized that even though it was crashing my host the VXLAN was actually working!

Before I conclude a hat-tip to this post for the Web UI test method and also for generally posting how the author set up his NSX test lab. That’s an example of how to post something like this properly, instead of the stream of thoughts my few posts have been. :)

Update: Short lived happiness. Next step was to create an Edge Services Gateway (ESG) and there I bumped into the MTU issues. And this time when I ran hte test via the Web UI it failed and crashed the hosts. Disappointed, I decided it was time to move on from VMware Workstation. :-/

Update 2: Continued here …