Contact

Subscribe via Email

Subscribe via RSS

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

TIL: Control Plane & Data Plane (networking)

Reading a bit of networking stuff, which is new to me, as I am trying to understand and appreciate NSX (instead of already diving into it). Hence a few of these TIL posts like this one and the previous. 

One common term I read in the context of NSX or SDN (Software Defined Networking) in general is “control plane” and “data plane” (a.k.a “forwarding” plane). 

This forum post is a good intro. Basically, when it comes to Networking your network equipment does two sort of things. One is the actual pushing of packets that come to it to others. The other is figuring out what packets need to go where. The latter is where various networking protocols like RIP and EIGRP come in. Control plane traffic is used to update a network device’s routing tables or configuration state, and its processing happens on the network device itself.  Data plane traffic passes through the router. Control plane traffic determines what should be done with the data plane traffic. Another way of thinking about control plan and data planes is where the traffic originates from/ is destined to. Basically, control plane traffic is sent to/ from the network devices to control it (e.g RIP, EIGRP); while data plane traffic is what passes through a network device.

( Control plane traffic doesn’t necessarily mean its traffic for controlling a network device. For example, SSH or Telnet could be used to connect to a network device and control it, but it’s not really in the control plane. These come more under a “management” plane – which may or may not be considered as a separate plane. )

Once you think of network devices along these lines, you can see that a device’s actual work is in the data plane. How fast can it push packets through. Yes, it needs to know where to push packets through to, but the two aren’t tied together. It’s sort of like how one might think of a computer as being hardware (CPU) + software (OS) tied together. If we imagine the two as tied together, then we are limiting ourselves on how much each of these can be pushed. If improvements in the OS require improvements in the CPU then we limit ourselves – the two can only be improved in-step. But if the OS improvements can happen independent of the underlying CPU (yes, a newer CPU might help the OS take advantage of newer features or perform better, but it isn’t a requirement) then OS developers can keep innovating on the OS irrespective of CPU manufacturers. In fact, OS developers can use any CPU as long as there are clearly defined interfaces between the OS and the CPU. Similarly, CPU manufacturers can innovate independent of the OS. Ultimately if we think (very simply) of CPUs as having a function of quickly processing data, and OS as a platform that can make use of a CPU to do various processing tasks, we can see that the two are independent and all that’s required is a set of interfaces between them. This is how things already are with computers so what I mentioned just now doesn’t sound so grand or new, but this wasn’t always the case. 

With SDN we try to decouple the control and data planes. The data plane then is the physical layer comprising of network devices or servers. They are programmable and expose a set of interfaces. The control plane now can be a VM or something independent of the physical hardware of the data plane. It is no longer limited to what a single network device sees. The control plane is aware of the whole infrastructure and accordingly informs/ configures the data plane devices.  

If you want a better explanation of what I was trying to convey above, this article might help. 

In the context of NSX its data plane would be the VXLAN based Logical Switches and the ESXi hosts that make it up. And its control plane would be the NSX Controllers. It’s the NSX Controllers that takes care of knowing what to do with the network traffic. It identifies all these, informs the hosts that are part of the data plane accordingly, and let them do the needful. The NSX Controller VMs are deployed in odd numbers (preferably 3 or higher, though you could get away with 1 too) for HA and cluster quorum (that’s why odd numbers) but they are independent of the data plane. Even if all the NSX Controllers are down the data flow would not be affected


I saw a video from Scott Shenker on the future of networking and the past of protocols. Here’s a link to the slides, and here’s a link to the video on YouTube. I think the video is a must watch. Here’s some of the salient points from the video+slides though – mainly as a reminder to myself (note: since I am not a networking person I am vague at many places as I don’t understand it myself):

  • Layering is a useful thing. Layering is what made networking successful. The TCP/IP model, the OSI model. Basically you don’t try and think of the “networking problem” as a big composite thing, but you break it down into layers with each layer doing one task and the layer above it assuming that the layer below it has somehow solved that problem. It’s similar to Unix pipes and all that. Break the problem into discrete parts with interfaces, and each part does what it does best and assumes the part below it is taking care of what it needs to do. 
  • This layering was useful when it came to the data plane mentioned above. That’s what TCP/IP is all about anyways – getting stuff from one point to another. 
  • The control plane used to be simple. It was just about the L2 or L3 tables – where to send a frame to, or where to send a packet to. Then the control plane got complicated by way of ACLs and all that (I don’t know what all to be honest as I am not a networking person :)). There was no “academic” approach to solving this problem similar to how the data plane was tackled; so we just kept adding more and more protocols to the mix to simply solve each issue as it came along. This made things even more complicated, but that’s OK as the people who manage all these liked the complexity and it worked after all. 
  • A good quote (from Don Norman) – “The ability to master complexity is not the same as the ability to extract simplicity“. Well said! So simple and prescient. 
    • It’s OK if you are only good at mastering complexity. But be aware of that. Don’t be under a misconception that just because you are good at mastering the complexity you can also extract simplicity out of it. That’s the key thing. Don’t fool yourself. :)
  • In the context of the control plane, the thing is we have learnt to master its complexity but not learnt to extract simplicity from it. That’s the key problem. 
    • To give an analogy with programming, we no longer think of programming in terms of machine language or registers or memory spaces. All these are abstracted away. This abstraction means a programmer can focus on tackling the problem in a totally different way compared to how he/ she would have had to approach it if they had to take care of all the underlying issues and figure it out. Abstraction is a very useful tool. E.g. Object Oriented Programming, Garbage Collection. Extract simplicity! 
  • Another good quote (from Barbara Liskov) – “Modularity based on abstraction is the way things get done“.
    • Or put another way :) Abstractions -> Interfaces -> Modularity (you abstract away stuff; provide interfaces between them; and that leads to modularity). 
  • As mentioned earlier the data plan has good abstraction, interfaces, and modularity (the layers). Each layer has well defined interfaces and the actual implementation of how a particular layer gets things done is down to the protocols used in that layer or its implementations. The layers above and below do not care. E.g. Layer 3 (IP) expects Layer 2 to somehow get it’s stuff done. The fact that it uses Ethernet and Frames etc is of no concern to IP. 
  • So, what are the control plane problems in networking? 
    • We need to be able to compute the configuration state of each network device. As in what ACLs are it supposed to be applying, what its forwarding tables are like …
    • We need to be able to do this while operating without communication guarantees. So we have to deal with communication delays or packet drops etc as changes are pushed out. 
    • We also need to be able to do this while operating within the limitations of the protocol we are using (e.g. IP). 
  • Anyone trying to master the control plane has to deal with all three. To give an analogy with programming, it is as though a programmer had to worry about where data is placed in RAM, take care of memory management and process communication etc. No one does that now. It is all magically taken care of by the underlying system (like the OS or the programming language itself). The programmer merely focuses on what they need to do. Something similar is required for the control plane. 
  • What is needed?
    • We need an abstraction for computing the configuration state of each device. [Specification Abstraction]
      • Instead of thinking of how to compute the configuration state of a device or how to change a configuration state, we just declare what we want and it is magically taken care of. You declare how things should be, and the underlying system takes care of making it so. 
      • We think in terms of specifications. If the intention is that Device A should not have access to Device B, we simply specify that in the language of our model without thinking of the how in terms of the underlying physical model. The shift in thinking here is that we view each thing as a layer and only focus on that. To implement a policy that Device A should not have access to Device B we do not need to think of the network structure or the devices in between – all that is just taken care of (by the Network Operating System, so to speak). 
      • This layer is  Network Virtualization. We have a simplified model of the network that we work with and which we specify how it should be, and the Network Virtualization takes care of actually implementing it. 
    • We need an abstraction that captures the lack of communication guarantees- i.e. the distributed state of the system. [Distributed State Abstraction]
      • Instead of thinking how to deal with the distributed network we abstract it away and assume that it is magically taken care of. 
      • Each device has access to an annotated network graph that they can query for whatever info they want. A global network view, so to say. 
      • There is some layer that gathers an overall picture of the network from all the devices and presents this global view to the devices. (We can think of this layer as being a central source of information, but it can be decentralized too. Point is that’s an implementation problem for whoever designs that layer). This layer is the Network Operating System, so to speak. 
    • We need an abstraction of the underlying protocol so we don’t have to deal with it directly.  [Forwarding Abstraction]
      • Network devices have a Management CPU and a Forwarding ASIC. We need an abstraction for both. 
      • The Management CPU abstraction can be anything. The ASIC abstraction is OpenFlow. 
      • This is the layer that closest to the hardware. 
  • SDN abstracts these three things – distribution, forwarding, and configuration. 
    • You have a Control Program that configures an abstract network view based on the operator requirements (note: this doesn’t deal with the underlying hardware at all) ->
    • You have a Network Virtualization layer that takes this abstract network view and maps it to a global view based on the underlying physical hardware (the specification abstraction) ->
    • You have a Network OS that communicates this global network view to all the physical devices to make it happen (the distributed state abstraction (for disseminating the information) and the forwarding abstraction (for configuring the hardware)).
  • Very important: Each piece of the above architecture has a very limited job that doesn’t involve the overall picture. 

From this Whitepaper:

SDN has three layers: (1) an Application layer, (2) a Control layer (the Control Program mentioned above), and (3) an Infrastructure layer (the network devices). 

The Application layer is where business applications reside. These talk to the Control Program in the Control layer via APIs. This way applications can program their network requirements directly. 

OpenFlow (mentioned in Scott’s talk under the ASIC abstraction) is the interface between the control plane and the data/ forwarding place. Rather than paraphrase, let me quote from that whitepaper for my own reference:

OpenFlow is the first standard communications interface defined between the control and forwarding layers of an SDN architecture. OpenFlow allows direct access to and manipulation of the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisor-based). It is the absence of an open interface to the forwarding plane that has led to the characterization of today’s networking devices as monolithic, closed, and mainframe-like. No other standard protocol does what OpenFlow does, and a protocol like OpenFlow is needed to move network control out of the networking switches to logically centralized control software.

OpenFlow can be compared to the instruction set of a CPU. The protocol specifies basic primitives that can be used by an external software application to program the forwarding plane of network devices, just like the instruction set of a CPU would program a computer system.

OpenFlow uses the concept of flows to identify network traffic based on pre-defined match rules that can be statically or dynamically programmed by the SDN control software. It also allows IT to define how traffic should flow through network devices based on parameters such as usage patterns, applications, and cloud resources. Since OpenFlow allows the network to be programmed on a per-flow basis, an OpenFlow-based SDN architecture provides extremely granular control, enabling the network to respond to real-time changes at the application, user, and session levels. Current IP-based routing does not provide this level of control, as all flows between two endpoints must follow the same path through the network, regardless of their different requirements.

I don’t think OpenFlow is used by NSX though. It is used by Open vSwitch and was used by NVP (Nicira Virtualization Platform – the predecessor of NSX).

Speaking of NVP and NSX: VMware acquired NSX from Nicira (which was a company founded by Martin Casado, Nick McKeown and Scott Shenker – the same Scott Shenker whose video I was watching above). The product was called NVP back then and primarily ran on the Xen hypervisor. VMware renamed it to NSX and it was has two flavors. NSX-V is the version that runs on the VMware ESXi hypervisor, and is in active development. There’s also NSX-MH which is a “multi-hypervisor” version that’s supposed to be able to run on Xen, KVM, etc. but I couldn’t find much information on it. There’s some presentation slides in case anyone’s interested. 

Before I conclude here’s some more blog posts related to all this. They are in order of publishing so we get a feel of how things have progressed. I am starting to get a headache reading all this network stuff, most of which is going above my head, so I am going to take a break here and simply link to the articles (with minimal/ half info) and not go much into it. :)

  • This one talks about how the VXLAN specification doesn’t specify any control plane.
    • There is no way for hosts participating in a VXLAN network to know the MAC addresses of other hosts or VMs in the VXLAN so we need some way of achieving that. 
    • Nicira NVP uses OpenFlow as a control-plane protocol. 
  • This one talks about how OpenFlow is used by Nicira NVP. Some points of interest:
    • Each Open vSwitch (OVS) implementation has 1) a flow-based forwarding module loaded in the kernel; 2) an agent that communicates with the Controller; and 3) an OVS DB daemon that keeps track of of the local configuration. 
    • NVP had clusters of 3 or 5 controllers. These used OpenFlow protocol to download forwarding entries into the OVS and OVSDB (a.k.a. ovsdb-daemon) to configure the OVS itself  (creating/ deleting/ modifying bridges, interfaces, etc). 
    • Read that post on how the forwarding tables and tunnel interfaces are modified as new devices join the overlay network. 
    • Broadcast traffic, unknown Unicast traffic, and Multicast traffic (a.k.a. BUM traffic) can be handled in two ways – either by sending these to an extra server that replicates these to all devices in the overlay network; or the source hypervisor/ physical device can encapsulate the BUM frame and send it as unicast to all the other devices in that overlay. 
  • This one talks about how Nicira NVP seems to be moving away from OpenFlow or supplementing it with something (I am not entirely clear).
    • This is a good read though just that I was lost by this point coz I have been doing this reading for nearly 2 days and it’s starting to get tiring. 

One more post from the author of the three posts above. It’s a good read. Kind of obvious stuff, but good to see in pictures. That author has some informative posts – wish I was more brainy! :)

Automatic Metric and Windows routing

IP routing involves metrics. This is the cost of each route. If there are multiple routes to a destination then the route with lowest metric/ cost is chosen.

In the context of Windows OS there are two metrics that come into play.

automatic metric

One is the metric of the interface/ NIC itself (that’s the “Automatic metric” checkbox above). By default its set to automatic, and this determines the cost of using that interface itself. For example if both your wireless and wired connection can access the Internet, which one should the machine choose? The interface metric is used to make this decision. You can assign a value to this metric if you want to force a decision.

Each interface can have multiple gateways to various networks it knows of. Could be that it has more than one gateway to the same network – say, your wired connection can connect to the Internet from two different routers on your network, which one should it choose? Here’s where the gateway metric comes into play (circled in the screenshot above). By default when you add a gateway its metric is set to automatic, but here too you can assign a value.

gateway metric

So far so good. Now how does all this come into play together?

The first thing to know is that gateway metrics have a value of 256 by default (when set to “Automatic metric”). So if you have more than one gateway to a particular destination, and the metric is set to automatic, then by default both gateways have a metric value of 256 and hence equal preference. Remember that.

The next thing to know is that interface metrics have a value ranging from 5 to 50 (when set to “Automatic metric”) based on the speed of the interface. Lower numbers are better than higher numbers. See this KB article for the numbers, here’s a screenshot from that article.

interface metrics

So if you have two wired connections for instance, one of speed 1 GB and other of speed 10 GB, then the 1 GB interface has a metric of 10 and the 10 GB interface has a metric of 5 – thus making the latter preferred.

To view the interface & gateway metrics assigned to your interfaces use the netsh interface ip show address command:

 

Notes on NLB, VMware, etc

Just some notes to myself so I am clear about it while reading about it. In the context of this VMware KB article – Microsoft NLB not working properly in Unicast mode.

Before I get to the article I better talk about a regular scenario. Say you have a switch and it’s got a couple of devices connected to it. A switch is a layer 2 device – meaning, it has no knowledge of IP addresses and networks etc. All devices connected to a switch are in the same network. The devices on a switch use MAC addresses to communicate with each other. Yes, the devices have IPv4 (or IPv6) addresses but how they communicate to each other is via MAC addresses.

Say Server A (IPv4 address 10.136.21.12) wants to communicate with Server B (IPv4 address 10.136.21.22). Both are connected to the same switch, hence on the same LAN. Communication between them happens in layer 2. Here the machines identify each other via MAC addresses, so first Server A checks whether it knows the MAC address of Server B. If it knows (usually coz Server A has communicated with Server B recently and the MAC address is cached in its ARP table) then there’s nothing to do; but if it does not, then Server A finds the MAC address via something called ARP (Address Resolution Protocol). The way this works is that Server A broadcasts to the whole network that it wants the MAC address of the machine with IPv4 address 10.136.21.22 (the address of Server B). This message goes to the switch, the switch sends it to all the devices connected to it, Server B replies with its MAC address and that is sent to Server A. The two now communicate – I’ll come to that in a moment.

When it’s communication from devices in a different network to Server A or Server B, the idea is similar except that you have a router connected to the switch. The router receives traffic for a device on this network – it knows the IPv4 address – so it finds the MAC address similar to above and passes it to that device. Simple.

Now, how does the switch know which port a particular device is connected to. Say the switch gets traffic addresses to MAC address 00:eb:24:b2:05:ac – how does the switch know which port that is on? Here’s how that happens –

  • First the switch checks if it already has this information cached. Switches have a table called the CAM (Content Addressable Memory) table which holds this cached info.
  • Assuming the CAM table doesn’t have this info the switch will send the frame (containing the packets for the destination device) to all ports. Note, this is not like ARP where a question is sent asking for the device to respond; instead the frame is simply sent to all ports. It is broadcast to the whole network.
  • When a switch receives frames from a port it notes the source MAC address and port and that’s how it keeps the CAM table up to date. Thus when Server A sends data to Server B, the MAC address and switch port of Server A are stored in the switch’s CAM table.  This entry is only stored for a brief period.

Now let’s talk about NLB (Network Load Balancing).

Consider two machines – 10.136.21.11 with MAC address 00:eb:24:b2:05:ac and 10.136.21.12 with MAC address 00:eb:24:b2:05:ad. NLB is a form of load balancing wherein you create a Virtual IP (VIP) such as 10.136.21.10 such that any traffic to 10.136.21.10 is sent to either of 10.136.21.11 or 10.136.21.12. Thus you have the traffic being load balanced between the two machines; and not only that if any one of the machines go down, nothing is affected because the other machine can continue handling the traffic.

But now we have a problem. If we want a VIP 10.136.21.10 that should send traffic to either host, how will this work when it comes to MAC addresses? That depends on the type of NLB. There’s two sorts – Unicast and Multicast.

In Unicast the NIC that is used for clustering on each server has its MAC address changed to a new Unicast MAC address that’s the same for all hosts. Thus for example, the NIC that holds the NLB IP address 10.136.21.10 in the scenario above will have its MAC address changed from 00:eb:24:b2:05:ac and 00:eb:24:b2:05:ad respectively to (say) 00:eb:24:b2:05:af. Note that the MAC address is a Unicast MAC (which basically means the MAC address looks like a regular MAC address, such as that assigned to a single machine). Since this is a Unicast MAC address, and by definition it can only be assigned to one machine/ switch port, the NLB driver on each machines cheats a bit and changes the source MAC address address to whatever the original NIC MAC address was. That is to say –

  • Server IP 10.136.21.11
    • Has MAC address 00:eb:24:b2:05:ac
    • Which is changed to a MAC address of 00:eb:24:b2:05:af as part of the Unicast IP/ enabling NLB
    • However when traffic is sent out from this machine the MAC address is changed back to 00:eb:24:b2:05:ac
  • Same for Server 10.136.21.12

Why does this happen? This is because –

  • When a device wants to send data to the VIP address, it will try find the MAC address using ARP. That is, it sends a broadcast over the network asking for the device with this IP address to respond. Since both servers now have the same MAC address for their NLB NIC either server will respond with this common MAC address.
  • Now the switch receives frames for this MAC address. The switch does not have this in its CAM table so it will broadcast the frame to all ports – reaching either of the servers.
  • But why does outgoing traffic from either server change the MAC address of outgoing traffic? That’s because if outgoing frames have the common MAC address, then the switch will associate this common MAC address with that port – resulting in all future traffic to the common MAC address only going to one of the servers. By changing the outgoing frame MAC address back to the server’s original MAC address, the switch never gets to store the common MAC address in its CAM table and all frames for the common MAC address are always broadcast.

In the context of VMware what this means is that (a) the port group to which the NLB NICs connect to must allow changes to the MAC address and allow forged transmits; and (b) when a VM is powered on the port group by default notifies the physical switch of the VMs MAC address, since we want to avoid this because this will expose the cluster MAC address to the switch this notification too must be disabled. Without these changes NLB will not work in Unicast mode with VMware.

(This is a good post to read more about NLB).

Apart from Unicast NLB there’s also Multicast NLB. In this form the NLB NIC’s MAC address is not changed. Instead, a new Multicast MAC address is assigned to the NLB NIC. This is in addition to the regular MAC address of the NIC. The advantage of this method is that since each host retains its existing MAC address the communication between hosts is unaffected. However, since the new MAC address is a Multicast MAC address – and switches by default are set to ignore such address – some changes need to be done on the switch side to get Multicast NLB working.

One thing to keep in mind is that it’s important to add a default gateway address to your NLB NIC. At work, for instance, the NLB IPv4 address was reachable within the network but from across networks it wasn’t. Turns out that’s coz Windows 2008 onwards have a strong host behavior – traffic coming in via one NIC does not go out via a different NIC, even if both are in the same subnet and the second NIC has a default gateway set. In our case I added the same default gateway to the NLB NIC too and it was then reachable across networks. 

HP DL360 Gen9 with HP FlexFabric 534 adapter and HP Ethernet 530 adapter and ESXi

That’s a very vague subject line, I know, but I couldn’t think of anything concise. Just wanted to put some keywords so that if anyone else comes across the same problem and types something similar into Google hopefully they stumble upon this post.

At work we got some HP DL360 Gen9s to use as ESXi hosts. To these servers we added additional network cards –

  • HP FlexFabric 10Gb 2-port 534FLR-SFP+ Adapter; and
  • HP Ethernet 10Gb 2-port 530SFP+ Adapter.

Each of these adapters have two NICs each. Here’s a picture of the adapters in the server and the vmnic numbers ESXi assigns to them.

serverIn this picture –

  • vmnic5 & vmnic4 are the HP FlexFabric 10Gb 2-port 534FLR-SFP+ Adapter;
  • vmnic6 & vmnic7 are the HP Ethernet 10Gb 2-port 530SFP+ Adapter; and
  • vmnic0 – vmnic3 are HP Ethernet 1Gb 4-port 331i Adapter (which come in-built into the server);
  • iLO is the iLO port (which I’ll ignore for now).

We didn’t want to use vmnic0 – vmnic3 as they are only 1Gb. So the idea was the use vmnic4 – vmnic7. Two NICs would be for Management+vMotion (connecting to two different switches); two NICs would be for iSCSI (again connecting to different switches).

We came across two issues. First was that the FlexFabric NICs didn’t seem to support iSCSI. ESXi showed two iSCSI adapters but the NICs mapped to them were the regular Ethernet 10Gb ones, not the FlexFabric 10Gb ones. Second issue was that we wanted to use vmnic4 and vmnic6 for Management+vMotion and vmnic5 and vmnic7 for iSCSI – basically a NIC from each adapter such that even if an adapter were to fail there’s a NIC from another adapter for resiliency. This didn’t work for some reason. The Ethernet 10Gb NICs weren’t “connecting” to the network switch for some reason. They would connect in the sense that the link status appears as connected and the LEDs on the switch and NICs blink, but something was missing. There was no real connectivity.

Here’s what we did to fix these.

But first, for both these fixes you have to reboot the server and go into the System Utilities menu.

f9 system utils

Change 1: Enable iSCSI on the FlexFabric adapter (vmnic4 and vmnic5)

Once in the System Utilities menu select “System Configuration”.

system configurationSelect the first FlexFabric NIC (port1).

select flexfabricThen select the Device Hardware Configuration menu.

select device hardwareYou will see that the storage personality is FCoE.

current flex personalityThat’s the problem. This is why the FlexFabric adapters don’t show up as iSCSI adapters. Select the FCoE entry and change it to iSCSI.

new flex personalityNow press Esc to go back to the previous menus (you will be prompted to save the changes – do so). Then repeat the above steps for the second FlexFabric NIC (port 2).

With this change the FlexFabric NICs will appear as iSCSI adapters. Now for the second change.

Change 2: Enable DCB for the Ethernet adapters

From the System Configuration menu now select the first Ethernet NIC (port 1).

select ethernetThen select its Device Hardware Configuration menu.

select device hardware (ethernet)Notice the entry for “DCB Protocol”. Most likely it is “Disabled” (which is why the NICs don’t work for you).

current DCBChange that to “Enabled” and now the NICs will work.

new DCBThat’s it. Once again press Esc (choosing to save the changes when prompted) and then reboot the system. Now all the NICs will work as expected and appear as iSCSI adapters too.

rebootI have no idea what DCB does. From what I can glean via Google it seems to be a set of extensions to Ethernet that provide “hardware-based bandwidth allocation to a specific type of traffic and enhances Ethernet transport reliability with the use of priority-based flow control” (via TechNet) (also check out this Cisco whitepaper for more info). I didn’t read much into it because I couldn’t find anything that mentioned why DCB mattered in this case – as in why were the NICs not working when DCB was disabled? The NICs are connected to an HP 5920AF switch but I couldn’t find anything that suggested the switch requires DCB enabled for the ports to work. This switch supports DCB but that doesn’t imply it requires DCB.

Anyhow, the FlexFabric adapters have DCB enabled by default which is probably why they worked. That’s how I got the idea to enable DCB on the Ethernet adapters to see if it makes a difference – and it did! The only thing I can think of is that DCB also seems to include a DCBX (Data Centre Bridging Exchange) protocol which is about discovering peers, discovering mismatched configuration etc – so maybe the fact that DCB was disabled on these adapters made the switch not “see” these NICs and soft-disable them somehow. That’s my guess at least.

CentOS NAT

My home lab setup is such that everything runs in my laptop within VMware Workstation (currently version 11) on a Windows 8.1 OS. Hyper-V might be a better choice here for performance but I am quite happy with VMware Workstation and it does what I want. Specifically – VMware allows for nested virtualization, so I can install a hypervisor such as ESXi or Hyper-V within VMware Workstation and run VMs in that! Cool, isn’t it?

VMware Workstation also natively supports installing ESXi within it as a VM. I can’t do that if I were using Hyper-V instead.

Finally, VMware Workstation has some laptop lab level benefits in that it easily lets you configure network such as having a NAT network up and running quickly. I can do the same with Hyper-V but that requires a bit more configuring (expected, because Hyper-V is meant for a different purpose – I should really be comparing Hyper-V with ESXi but that’s a comparison I can’t even make here).

Within VMware Workstation I have a Windows environment setup where I play with AD, IPv6, DirectAccess, etc. This has a Corporate Network spread over two sites. I also have some CentOS VMs that act as routers – to provide routing between the two Corporate Network sites above, for instance – and also act as NAT routers for remote use. Yes, my Workstation network also has a fake Internet and some fake homes which is a mix of CentOS VMs acting as routers/ NAT and Server 2012 for DNS (root zones, Teredo, etc).

The CentOS VM that acts as a router between the Corporate Networks can also do real Internet routing. For this I have an interface that’s connected to the NAT network of VMware Workstation. I chose to NAT this interface because back when I created this lab I used to hook up the laptop to a network with static IP. I had only one IPv4 address to use so couldn’t afford to bridge this interface because I had no IPv4 address to assign it.

Because of the NAT interface the CentOS VM itself can access the real Internet. But what about other VMs that forward to this for their Internet routing? You would think that simply enabling packet forwarding on this VM is sufficient – but that won’t do. This is because packet forwarding for when forwarding between networks but in this case the external network does not know anything about my virtual networks that are behind a NAT (of VMware Workstation) so simply forwarding won’t work. So what you need to do apart from forwarding is also set up NAT on the CentOS VM so as far as the external network is concerned everything is coming from the CentOS VM (via its interface that is NAT’d with VMware Workstation).

I have done all this in the past but today I needed to revisit this after some time and forgot what exactly I had done. So this post is just a reminder to myself on what needs to be done.

First, enable packet forwarding in the OS. Add the following line to /etc/sysctl.conf:

Reboot the VM or run the following to load it now itself:

Now modify the firewall rules to allow packet forwarding as well as NAT (aka MASQUERADE in iptables):

Lines 1-6 and 16 are the relevant ones here. I have a few extra like allow ICMP and SSH but those don’t matter for what I am doing here.

Windows – View hidden network interface IP address

Was troubleshooting something in VMware, I ended up copying the actual files of a VM from one datastore to another and re-adding it to a host (because the original host was stuck entering into maintenance mode and I needed this VM to sort that out blah blah … doesn’t matter!). Problem is when I did the re-adding and vCenter asked me if I copied or moved the VM, I said I copied. This resulted in all the network interfaces getting new MAC addresses (among other changes) and suddenly my VM was without any of the previously configured static IPs!

Damn.

The old interfaces are still there just that they are hidden.

I used PowerShell/ WMI to list all the network interfaces on the server (this shows the hidden ones too).

In the Network Connections GUI all I can see are the last two adapters, so everything else is hidden. If I just wanted to delete them I would have followed the instructions in this post. (When following the instructions in that post be sure to enter set devmgr_show_nonpresent_devices=1 on a line of its own).

The ones starting with “vmxnet3” are what’s of interest for me so let’s focus on that. 

The reason I focused on SettingID is because if you check under HKLM\SYSTEM\ControlSet001\services\Tcpip\Parameters\Interfaces you’ll find entries with these GUIDs.

With a bit of PowerShell you can enumerate these keys and the IP address assigned to it:

My PowerShell skills are getting worse by the day due to disuse so the above is not probably the most elegant way of doing this. :)

Anyways, by comparing the two outputs I was able to identify the missing IP as either 10.136.37.36 or 10.134.203.2. The latter was of a different network so it looked like an older hidden adapter. I set the former, and as expected the network started working. Yay!

 

Load balancing in vCenter and ESXI

One of the things you can do with a portgroup is define teaming for the underlying physical NICs.

teaming

If you don’t do anything here, the default setting of “Route based on originating virtual port” applies. What this does is quite obvious. Each virtual port on the virtual switch is mapped to a physical NIC behind the scenes; so all traffic to & from that virtual port goes & comes via that physical NIC. Since your virtual NIC connects to a virtual port this is equivalent to saying all traffic for that virtual NIC happens via a particular physical NIC.

In the screenshot above, for instance, I have two physical NICs dvUplink1 and dvUplink2. If I left teaming at the default setting and say I had 4 VMs connecting to 4 virtual ports, chances are two of these VMs will use dvUplink1 and two will use dvUplink2. They will continue using these mappings until one of the dvUplinks dies, in which case the other will take over – so that’s how you get failover.

This is pretty straightforward and easy to set up. And the only disadvantage, if any, is that you are limited to the bandwidth of a single physical NIC. If each of dvUplink1 & dvUplink2 were 1Gb NICs it isn’t as though the underlying VMs had 2Gb (2 NICs x 1Gb each) available to them. Since each VM is mapped to one uplink, 1Gb is all they get.

Moreover, if say two VMs were mapped to an uplink, and one of them was hogging up all the bandwidth of this uplink while the remaining uplink was relatively free, the other VM on this uplink won’t automatically be mapped to the free uplink to make better use of resources. So that’s a bummer too.

A neat thing about “Route based on originating virtual port” is that the virtual port is fixed for the lifetime of the virtual machine so the host doesn’t have to calculate which physical NIC to use each time it receives traffic to & from the virtual machine. Only if the virtual machine is powered off, deleted, or moved to a different host does it get a new virtual port.

The other options are:

  • Route based on MAC hash
  • Route based on IP hash
  • Route based on physical NIC load
  • Explicit failover

We’ll ignore the last one for now – that just tells the host to use the first physical NIC in the list and use that for all VMs.

“Route based on MAC hash” is similar to “Route based on originating virtual port” in that it uses the MAC address of the virtual NIC instead of virtual port. I am not very clear on how this is better than the latter. Since the MAC address of a virtual machine is usually constant (unless it is changed or a different virtual NIC used) all traffic from that MAC address will use the same physical NIC always. Moreover, there is the additional overhead in that the host has to check each packet for the MAC address and decide which physical NIC to use. VMware documentation says it provides a more even distribution of traffic but I am not clear how.

“Route based on physical NIC load” a good one. It starts off with “Route based on originating virtual port” but if a physical NIC is loaded, then the virtual ports mapped to it are moved to a physical NIC with less load! This load balancing option is only available for distributed switches. Every 30s the distributed switch checks the physical NIC load and if it exceeds 75% then the virtual port of the VM with highest utilization is moved to a different physical NIC. So you have the advantages of “Route based on originating virtual port” with one of its major disadvantages removed.

In fact, except for “Route based on IP hash” none of the other load balancing mechanisms have an option to utilize more than a single physical NIC bandwidth. And “Route based on IP hash” does not do this entirely as you would expect.

“Route based on IP hash”, as the name suggests, does load balancing based on the IP hash of the virtual machine and the remote end it is communicating with. Based on a hash of these two IP addresses all traffic for the communication between these two IPs is sent through one NIC. So if a virtual machine is communicating with two remote servers, it is quite likely that traffic to one server goes through one physical NIC while traffic to the other goes via another physical NIC – thus allowing the virtual machine to use more bandwidth than that of one physical NIC. However – and this is an often overlooked point – all traffic between the virtual server and one remote server is still constrained by the bandwidth of the physical NIC it happens via. Once traffic is mapped to a particular physical NIC, if more bandwidth is required or the physical NIC is loaded, it is not as though an additional physical NIC is used. This is a catch with “Route based on IP hash” that’s worth remembering.

If you select “Route based on IP hash” as a load balancing option you get two warnings:

  • With IP hash load balancing policy, all physical switch ports connected to the active uplinks must be in link aggregation mode.
  • IP hash load balancing should be set for all port groups using the same set of uplinks.

What this means is that unlike the other load balancing schemes where there was no additional configuration required on the physical NICs or the switch(es) they connect to, with “Route based on IP hash” we must combine/ bond/ aggregate the physical NICs as one. There’s a reason for this.

In all the other load balancing options the virtual NIC MAC is associated with one physical NIC (and hence one physical port on the physical switch). So incoming traffic for a VM knows which physical port/ physical NIC to go via. But with “Route based on IP hash” there is no such one to one mapping. This causes havoc with the physical switch. Here’s what happens:

  • Different outgoing traffic flows choose different physical NICs. With each of these packets the physical switch will keep updating its MAC address table with the port the packet was got from. So for instance, say the two physical NICs are connected to physical switch Port1 and Port2 and the virtual NIC MAC address is VMAC1. When an outgoing traffic packet goes via the first physical NIC, the switch will update its tables to reflect that VMAC1 is connected to Port1. Subsequent traffic flows might continue using the first physical NIC so all is well. Then say a traffic flow uses the second physical NIC. Now the switch will map VMAC1 to Port2; then a traffic flow could use Port1 so the mapping gets changed to Port1, and then Port2, and so on …
  • When incoming traffic hits the physical switch for MAC address VMAC1, the switch will look up its tables and decide which port to send traffic on. If the current mapping is Port1 traffic will go out via that; if the current mapping is Port2 traffic will go out via that. The important thing to note is that the incoming traffic flow port chosen is not based on the IP hash mapping – it is purely based on whatever physical port the switch currently has mapped for VMAC1.
  • So what’s required is a way of telling the physical switch that the two physical NICs are to be considered as bonded/ aggregated such that traffic from either of those NICs/ ports is to be treated accordingly. And that’s what EtherChannel does. It tells the physical switch that the two ports/ physical NICs are bonded and that it must route incoming traffic to these ports based on an IP hash (which we must tell EtherChannel to use while configuring it).
  • EtherChannel also helps with the MAC address table in that now there can be multiple ports mapped to the same MAC address. Thus in the above example there would now be two mappings VMAC1-Port1 and VMAC1-Port2 instead of them over-writing each other!

“Route based on IP hash” is a complicated load balancing option to implement because of EtherChannel. And as I mentioned above, while it does allow a virtual machine to use more bandwidth than a single physical NIC, an individual traffic flow is still limited to the bandwidth of a single physical NIC. Moreover there is more overhead on the host because it has to calculate the physical NIC used for each traffic flow (essentially each packet).

Prior to vCenter 5.1 only static EtherChannel was supported (unless you use a third party virtual switch such as the Cisco Nexus 1000V). Static EtherChannel means you explicitly bond the physical NICs. But from vCenter 5.1 onwards the inbuilt distributed switch supports LACP (Link Aggregation Control Protocol) which is a way of automatically bonding physical NICs. Enable LACP on both the physical switch and distributed switch and the physical NICs will automatically be bonded.

(To enable LACP on the physical NICs go to the uplink portgroup that these physical NICs are connected to and enable LACP).

lacpThat’s it for now!

Update

Came across this blog post which covers pretty much everything I covered above but in much greater detail. A must read!

Hyper-V NAT & virtual switches

Played with Hyper-V on my laptop after a long time today. For the past 6 months or so I’ve been exclusively with VMware Workstation on my primary laptop, and things are so much easier there if you just want some VMs running on your laptop. Installing VMs is a snap because most OSes will be automatically installed for you. Similarly, networking is straight-forward as VMware Workstation provides NAT out of the box. Hyper-V being more of an “enterprise” thingy (to be compared with VMware ESXi) you have to do these yourself.

For instance NAT. Hyper-V offers three type of network switches to connect your VM to the outside world. These are:

  • External switch: This creates a virtual switch that is bound to your physical network adapter. What happens when you make such a switch is that: (1) a new virtual switch is created with the name you provide, (2) this switch is bound to your physical network adapter (think of it as though your physical network adapter has become a switch), (3) a new virtual network adapter is created on your physical machine with the name “Hyper-V Virtual Ethernet Adapter #X” (replace X with a number), and (4) this virtual adapter is hooked up to the aforementioned switch and gets the IP address etc that was assigned to your physical adapter.

    So your physical adapter has become a virtual switch. And your physical machine uses a virtual adapter – connected to this virtual switch – to connect to the network.

    Other VMs you create and connect to this External switch have virtual adapters in them that too connect to this virtual switch. So it’s as if all these VMs and your physical machine are connected directly to a switch that is connected to your physical network. That’s why it’s called an External switch – everything is connected directly to the external world.

  • Internal switch: This too creates a virtual switch, but this virtual switch is not bound to your physical adapter. It’s just a virtual switch in a virtual ether of its own! What happens when you create an Internal switch is quite similar to the steps for an External switch: (1) a new virtual switch is created with the name you provide, (2) a new virtual network adapter is created on your physical machine with the name “Hyper-V Virtual Ethernet Adapter #X” (replace X with a number), and (3) this virtual adapter is hooked up to the aforementioned switch.

    Notice your physical adapter is left as it is. The newly created virtual adapter and switch have nothing to do with it. They exist in a world of their own. When you create new VMs connecting to this Internal switch, all their virtual adapters too connect to this virtual switch. The VMs and your physical machine can thus talk to each other, but they can’t jump over to the physical network.

    If you want your VMs to be able to talk to the outside world in such a scenario, you must make provisions for the communication. Like install & enable a routing service between the virtual adapter on the physical machine that’s connected to the Internal switch, and the physical adapter that’s connected to the physical network. Or enable NAT if routing isn’t possible. Either ways, the idea behind an Internal switch is that all your VMs are hooked up to that switch, and they can talk to your physical machine but they can’t talk to the outside world.

  • Private switch: Lastly we have a Private switch. This is just like an Internal switch, with the difference that there’s no virtual adapter created on the physical machine. So the physical machine has no connection to the Private switch. Meaning – all the VMs connected to the Private switch cannot contact the physical machine. They are in a bubble of their own.

So that’s that.

In my case I wanted to have all my VMs be in an Internal switch and also have them talk to the outside world. I couldn’t enable routing due to my network configuration, so I decided to enable NAT. To do this I created an Internal switch as usual and then (1) went to “Control Panel” > “Network and Internet” > “Network Connections”, (2) right clicked on my physical adapter that’s connected to the outside world, (3) went to the Sharing tab, (4) ticked the box that said “Allow other network users to connect …” and selected the adapter connected to the Internal switch from thee drop-down menu, and (5) clicked OK.

ICS

This enables NAT on my physical machine, as well as a DHCP server for the Internal switch network.

Here’s ipconfig for the virtual adapter connected to the Internal switch before I enable sharing:

And here’s the same once I enable sharing:

Unfortunately you don’t get much control over the NAT in terms of choosing the IP address network (it’s always the 192.168.137.0/24 network). You do get to choose what ports are forwarded (click “Settings” in the previous screenshot) and that’s about it.

If I check the VM now it has got an IP address from this network as expected:

The DNS server for this interface is set to the virtual adapter on the physical machine (which will perform the query on behalf of the VM and get back). The VM now has connectivity to the outside world as expected.

That’s all for now! Hope this helps someone.

IPv6 through OpenVPN bridging

I spent the better part of today and yesterday trying to fix something that was broken because I was typing the wrong IPv4 address! Instead of connecting to (fake) address 36.51.55.1 I was typing 36.55.51.1. And since it wasn’t working I kept thinking it was a configuration error. Grr!

Here’s what I am trying to do:

  • I have a Linux server with a /64 IPv6 block assigned to it. The server has IPv6 address 2dcc:7c4e:3651:55::1/64, IPv4 address 36.51.55.1/24. (The server is a virtual machine on my laptop so these aren’t real public IPs).
  • I want a Windows client to connect to this server via OpenVPN, get/ assign itself a public IPv6 address from the /64 block, and have IPv6 connectivity to the Internet. Easy peasy?

I could use services like SixSx or Hurricane Electric to get a tunnel, but I am curious whether it will work over OpenVPN. The idea is to do it in a test lab now and replicate in the real world later (if I feel like it). It’s been a while since I used OpenVPN so this is a good chance to play with it too.

Note

I want a single client to connect to the server so I don’t need a fancy server setup. What I am trying to achieve is a peer-to-peer setup (aka point-to-point setup). Below I refer to the client as OpenVPN client, the server as OpenVPN server, but as far as OpenVPN is concerned it’s a peer-to-peer setup.

Bridging & Routing

OpenVPN has two ways of connecting machines. One is called routing, wherein the remote machine is assigned an IP address from a different network range and so all remote machines are as though they are on a separate network. This means the OpenVPN server machine acts as a router between this network and the network the server itself is connected to. Hence the name “routing”.

The other method is called bridging, and here the OpenVPN server connects the remote machines to the same network it is on. Whereas in routing the OpenVPN server behaves like a router, in bridging the OpenVPN server behaves like a switch. All remote machines have IP addresses from the same network as the OpenVPN server. They are all on the same LAN essentially.

The advantage of bridging is that it works at the layer 2 level (because the remote machine is plugged in to the real network) and so any protocol in the upper levels can be used over it. In contrast routing works at layer 3 (because the remote machine is plugged into a separate network) and so only layer 3 protocols supported by OpenVPN can be used. Previously it was only IPv4 but now IPv6 support too exists. I could use either method in my case, but I am going ahead with bridging because 1) I want the remote machines to behave as though they are connected to the main network, and 2) I just have a /64 block and I don’t want to muck about with subnetting to create separate networks (plus blocks smaller than /64 could have other issues). Maybe I will try routing later, but for now I am going to stick with bridging.

To achieve bridging and routing OpenVPN makes use of two virtual devices – TUN (tunnel adapter) and TAP (network tap). Of these only TAP does bridging, so that’s the route I’ll take.

Before moving on I’ll point to a good explanation of Bridging vs Routing from GRC. The OpenVPN FAQ has an entry on the advantages & disadvantages of each.

OpenVPN: Quick setup on the Server side

A good intro to setting up OpenVPN can be found in the INSTALL file of OpenVPN. It’s not a comprehensive document, so it’s not for everyone, but I found it useful to get started with OpenVPN. The steps I am following are from the “Example running a point-to-point VPN” section with some inputs from the manpage. A point-to-point setup is what I want to achieve.

OpenVPN has various switches (as can be seen in the manpage). You can invoke OpenVPN with these switches, or put them in a config file and invoke the config file. For a quick and dirty testing it’s best to run OpenVPN with the switches. Here’s how I launched the server:

In practice, you shouldn’t do this as it launches the tunnel with NO encryption. You even get a warning from OpenVPN:

A better approach is to generate a shared key first and copy it to the client(s) too. A shared key can be generated with the --secret switch:

And OpenVPN can be launched with this shared key thus:

Note

Remember to copy the shared key to the client. If a client connects without the shared key it will fail:

Notice I am using the --dev switch to specify that I want to use a TAP device.

TAP devices

TAP and TUN are virtual devices. TAP is a layer 2 virtual device, TUN is a layer 3 virtual device.

Each OpenVPN instance on the server creates a TAP device (if it’s configured to use TAP) – names could be tap0, tap1, and so on. It’s important to note that each OpenVPN instance on the server uses a single TAP device – you don’t create TAP devices for each client. Similarly OpenVPN on the client side creates a TAP device when it’s launched. After creating the TAP device, OpenVPN client contacts OpenVPN server and both setup a communication between their respective TAP devices. This “communication” is basically an encrypted UDP connection (by default; though you could have non-encrypted UDP or TCP connections too) from the client to the server. The client OS and applications think its TAP device is connected to the server TAP device and vice versa, while the OpenVPN programs on both ends do the actual moving around of the packets/ frames.

In my case, since I am doing a peer-to-peer connection, the TAP device created on the server will be used by a single client. If I want to add another client, I’ll have to launch a new OpenVPN instance – listening on a separate port, with a separate TAP device, preferably with a separate secret.

If multiple clients try to use the same TAP device on the server, both clients won’t work and the server generates errors like these:

This is because packets from both clients are appearing on the same TAP device and confusing the OpenVPN server.

Creating a bridge

When OpenVPN on the server creates a TAP device it is not automatically connected to the main network. That is something additional we have to do. On Linux this is done using the bridge-utils package. Since I am using CentOS, the following will install it:

A bridge is the (virtual) network switch I was talking about above. It’s a virtual device, to which you connect the machine’s network adapter as well as the TAP adapters you create above. Keep in mind a bridge is a layer 2 device. It doesn’t know about IP addresses or networks. All it knows are MAC addresses and that it has (virtual) network adapters plugged in to it.

The bridge can have any name, but let’s be boring and call it br0. I create a bridge thus:

Next I add the physical network card of my machine to the bridge. Think of it as me connecting the network card to the bridge.

Here’s a catch though. Your physical network adapter is already connected to a switch – the physical switch of your network! – so when you connect it to our virtual switch it is implicitly disconnected from the physical switch. There’s no warning given regarding this, just that you lose connectivity. As far as the machine is concerned your physical network adapter is now connected to the virtual switch (bridge), and this virtual switch is connected to the physical switch.

What this means is that all the network settings you had so far configured on the physical adapter must now be put on the bridge. Bridges are not meant to have IP addresses (remember they are layer 2 devices) but this is something we have to do here as that’s the only way to connect the bridge to the physical network.

To make this change explicit let’s zero IP the physical interface. It’s not really needed (as the interface automatically loses its settings) but I think it’s worth doing so there’s no confusion.

Next I assign the bridge interface the IP settings that were previously on the physical adapter:

Since I am on CentOS and I’d like this to be automatically done each time I start the server I must make changes to the /etc/sysconfig/network-scripts/ifcfg-eth0 and /etc/sysconfig/network-scripts/ifcfg-br0 files. I just copied over the ifcfg-eth0 file to ifcfg-br0 and renamed some of the parameters in the file (such as the interface name, MAC address, etc).

Here’s my /etc/sysconfig/network-scripts/ifcfg-eth0 file. It’s mostly the defaults except for the last two lines:

This Fedora page is a good source of information on setting up a bridge in CentOS/ RHEL/ Fedora. The PROMISC=yes line is added to set the physical adapter in promiscuous mode. This is required by OpenVPN. (By default network adapters only accept frames corresponding to their MAC address. Promiscuous mode tells the adapter to accept frames for any MAC address. Basically you are telling the adapter to be promiscuous!)

Next, here’s my /etc/sysconfig/network-scripts/ifcfg-br0 file:

The first three lines are what matter here. They give the device a name, set it as type Bridge, and tell the kernel it needn’t delay in passing frames through the bridge (by default there’s a 30s delay). The other lines are from the original eth0 configuration file, modified for br0.

Once these two files are in place, even after a reboot the bridge interface will be automatically created, assigned an IP address connecting it to the physical switch, and the physical adapter will be put in promiscuous mode and connected to this bridge.

Creating TAP devices

Back to TAP devices. Once OpenVPN creates a TAP device it must be added to the bridge previously created. This is what connects the TAP device to the main network.

Typically a shell script is used that adds the dynamically created TAP device to the bridge. For instance OpenVPN’s Ethernet Bridging page has a BASH script. Similarly the OpenVPN INSTALL document gives another BASH script (which I like more and for some reason made more sense to me).

CentOS installs an OpenVPN wrapper script in /etc/init.d/openvpn which lets you start and stop OpenVPN as a service. This wrapper script automatically starts an OpenVPN process for each config file ending with the .conf extension in the /etc/openvpn directory and if a /etc/openvpn/xxx.sh script exists for the xxx.conf file it executes this file before launching the OpenVPN process. Thus CentOS already has a place to put in the script to add TAP interfaces to the bridge.

(CentOS also looks for two files /etc/openvpn/openvpn-start and /etc/openvpn/openvpn-shutdown which are run when the wrapper script starts and shutdown – basically they are run before any of the configuration files or scripts are run. I don’t think I’ll use these two files for now but just mentioning here as a future reference to myself).

The shell script can behave in two ways – depending on how TAP devices are created. TAP devices can be created in two ways:

  1. They can be created on the fly by OpenVPN – in which case you’ll need to invoke the script via an up switch; the script will be passed a variable $dev referring to the TAP device name so it can be set in promiscuous mode and added to the bridge.

    To create TAP devices on the fly use the --dev tap switch – notice it doesn’t specify a specific TAP device such as tap0. OpenVPN will automatically create a tap0 (or whatever number is free) device.

  2. They can be pre-created as persistent TAP devices – these will stay until the machine is rebooted, and are preferred for the reason that you don’t need --up and --down switches to create them, and the devices stay on even if the connection resets. In this case the script can create TAP devices and configure and add them to the bridge – independent of OpenVPN.

    To use a specific TAP device use the --dev tap0 switch (where tap0 is the name of the device you created).

I will be using persistent TAP devices. These can be created using the --mktun switch:

Next I set them in promiscuous mode, bring up the device, zero the IP, and set it in promiscuous mode:

Lastly I add these to the bridge:

Assuming my configuration file is called /etc/openvpn/ipv6.conf I’ll create a file called /etc/openvpn/ipv6.sh and add the following:

Don’t forget to make the file executable:

Now whenever the OpenVPN service on CentOS starts, it will run the script above to create the TAP device, configure and add it to the bridge, then launch OpenVPN with a configuration file that will actually use this device. This configuration file will refer by name to the TAP device created in this script.

Firewall

The OpenVPN server listens on UDP port 1194 by default. So I added a rule to allow this port to iptables.

In CentOS I made the following change to /etc/sysconfig/iptables:

Then I reloaded the iptables service so it loads the new rule:

Forwarding

Next I enabled IPv4 & IPv6 forwarding. I added the following lines to /etc/sysctl.conf:

And got sysctl to read the file and load these settings:

IPv4 forwarding is not needed in my case, but I enabled it anyways. Also note that I am enabling IPv6 forwarding for all interfaces. That’s not necessary, the forwarding can be enabled for specific interfaces only.

In addition to enabling forwarding at the kernel level, iptables too must be configured to allow forwarding. The IPv6 firewall is called ip6tables and its configuration file is /etc/sysconfig/ip6tables. The syntax is similar to /etc/sysconfig/iptables and in CentOS forwarding is disabled by default:

I commented out the second line and reloaded the service.

Lastly, I added the following line to /etc/system-config/network:

This tells the initscripts bringing up the interfaces that IPv6 forwarding is to be enabled on all interfaces. Again, this can be specified on specific interfaces – in which you you’d put it in the /etc/system-config/network-scripts/ifcfg-xxx file for that interface – or enabled globally as above but disabled in the file of specific interfaces. For the curious, the initscripts-ipv6 homepage describes more options.

To summarize

To summarize what I’ve done so far:

  1. Create a bridge, add the physical adapter to the bridge, configure the bridge.
  2. Create persistent TAP devices – best to create a script like I did above so we can set it to run automatically later on.
  3. Create a secret key, copy the secret key to the client(s).
  4. Modify firewall, enable forwarding.
  5. Launch OpenVPN, telling it to use the secret key and TAP. No need to specify the TAP device name, OpenVPN will figure it out.

Mind you, this is a quick and dirty setup. Using shared keys won’t work scale, but they will do fine for me now. And rather than launch OpenVPN directly I must put it into a conf file with some logging and additional options … but that can be done later.

Troubleshooting

A very useful switch for troubleshooting purposes is --verb x where x is a number from 0 to 11. 0 means no output except fatal errors. 1 (the default) to 4 mean normal usage errors. 5 to be more verbose, and also output R and W characters to the console for each packet read and write, uppercase is used for TCP/UDP packets and lowercase is used for TUN/TAP packets. And 6-11 for debug range output.

I usually stick with --verb 3 but when troubleshooting I start with --verb 5.

Also, while UDP is a good protocol to use for VPN, during troubleshooting it’s worth enabling TCP to test connectivity. In my case, for instance, the client seemed to be connecting successfully – there were no error messages on either end at least – but I couldn’t test connectivity as I can’t simply telnet to port 1194 of the server to see if it’s reachable (since it was uses UDP). So I enabled TCP temporarily, got OpenVPN on the server to listen on TCP port 1194, then did a telnet to that port from the client … and realized I had been typing the wrong IPv4 address so far!

Here’s how you launch OpenVPN server with TCP just in case:

Be sure to open the firewall and use the corresponding switch --proto tcp-client on the client end too.

OpenVPN: Quick setup on the Client side

The server is running with the following command (remember to add --verb x if you want more verbosity):

I launch the client thus (again, remember to add --verb x if you want more verbosity):

The command is quite self-explanatory.

  1. The --remote switch specifies the address of the server.
  2. The --dev tap switch tells OpenVPN to use a TAP device. On Windows TAP devices are persistent by default so this will be a persistent TAP device.
  3. Port is 1194, the default, so I don’t need to specify explicitly. Similarly protocol is UDP, the default.
  4. The shared key is specified using the --secret switch.

And that’s it! Now my client’s connected to the server. It doesn’t have an IPv6 address yet because the server doesn’t have DHCPv6 or Router Advertisements turned on. I could turn these ON on the server, or assign the client a static IPv6 address and gateway. The address it gets will be a global IPv6 address from the /64 block so you should be able to connect to it from anywhere on the IPv6 Internet! (Of course, firewall rules on the client could still be blocking packets so be sure to check that if something doesn’t work).

Wrapping up

Here’s the final setup on the server and client. This is where I put the settings above into config files.

On the server side I created a file called /etc/openvpn/ipv6.conf:

I renamed the previously generated key to /etc/openvpn/ipv6.key.

Created /etc/openvpn/ipv6.sh:

Also created /etc/openvpn/openvpn-shutdown to remove the TAP devices when the OpenVPN service stops:

Apart from this I created /etc/sysconfig/network-scripts/ifcfg-br0 and /etc/sysconfig/network-scripts/ifcfg-eth0 as detailed earlier. And made changes to the firewall rules.

On the client side I created a file C:\Program Files\OpenVPN\config\ipv6.ovpn:

I copied the shared key to C:\Program Files\OpenVPN\config\ipv6.key. Note that it is important to escape the path name in the ovpn file on Windows.

I created two command files to be run when the client connects and disconnects. These are specified by the --up and --down switches.

Here’s ipv6-up.cmd:

And here’s ipv6-down.cmd:

I should add code to set the DNS servers too, will do that later.

FYI

There’s a reason why I use the short name format to specify the path for these files. Initially I was using the full name escaped but the client refused to launch with the following error:

I noticed that it was trying to run the ipv6-up.cmd file without escaping the space between “Program Files”. Must be an OpenVPN bug, so to work around that I used the short name path.

To connect to the server I launch OpenVPN thus:

OpenVPN for Windows also installs a GUI system tray icon (called OpenVPN GUI). If your config files end with the ovpn extension and are in the C:\Program Files\OpenVPN\config\ folder you can right click this tray icon and connect. Very handy!

Note

This post was updated to include the wrapping up section above. And some errors in my understanding of TAP devices were corrected.

This post was further updated to add the netsh files for the client and also to clarify that this is a peer-to-peer setup. It will only work with one client at a time. Multiple clients will fail unless you create separate OpenVPN instances (on different ports and TAP adapters) to service them.

ERROR: [ipv6_set_default_route]

The other day I was configuring IPv6 on a CentOS box and got the following error:

I had set the following in my /etc/sysconfig/network-scripts/ifcfg-eth0 file:

The error message was telling me what’s wrong, but who bothers reading error messages!

The problem was this: I have specified the default gateway as fe80::254 but that’s a link-local address. How does the system know which link to use this with? I’d either have to specify a scope with the gateway address, or clarify which network interface to use. That’s what the error message is trying to say.

So the solution is one of the below.

Either specify the scope id:

Or specify the device to use:

Another thing, if your network router is sending Router Advertisements you can skip specifying the gateway altogether and ask the interface to auto configure.

This is how you’d configure your network on a VPS. The VPS provider will have allocated to /64 block to you. Unlike IPv4 where the provider will take an IPv4 from the same network as you to use as the router for your network, with IPv6 they don’t need to do that as link-local addresses can be used. So they’ll either provide you with a link-local address, or you can turn on auto configure and pick it up yourself (provided it’s being advertised).

Lastly, I needn’t have put this configuration in the interface specific files. I could have put it in /etc/sysconfig/network too.

CentOS – initial configuration notes

Nothing new here, just stuff I can refer to later when I forget. I’ve usually worked with Debian but am not working with CentOS for a change so some things are new to me.

  1. If you install CentOS in a machine with less than 1024 MB RAM you don’t get the graphical installation option. So best to install with 1024 MB RAM and then downgrade RAM if you want to. I have a CentOS VM with less than 512 MB RAM. I don’t necessarily need a graphical installation but I prefer it as it’s friendly. I get to choose the hostname for instance, select the packages I want to install, and so on. Without a graphical install you are given a minimal install only with no options to choose.
  2. Set the hostname in /etc/sysconfig/networking.
  3. Network interfaces are managed by NetworkManager, which is useful if you have a GUI or like to use the nmcli command. I prefer old fashioned ifconfig so I manage my network devices via /etc/sysconfig/network-scripts/ifcfg-* files.
    • Set NM_CONTROLLED=no in this file to tell NetworkManager not to bother with this interface.
    • Set ONBOOT=yes to up this interface on boot up.
    • Use IPADDR=a.b.c.d, NETMASK=x.y.z.f, GATEWAY=p.q.r.s to set the IPv4 address, netmask, and gateway. Use BOOTPROTO=dhcp if you’d prefer DHCP instead.
    • Using IPV6INIT=no and IPV6_AUTOCONF=no is supposed to turn off IPv6 for that interface but it doesn’t seem to. A better way to turn off IPv6 (and/ or control its parameters) is via sysctl.
    • For more options this link from Oracle is a useful reference, as is the initscripts-ipv6 homepage.
  4. To disable IPv6 on an interface the following sysctl setting helps: net.ipv6.conf.eth0.disable_ipv6 = 1. Add entries like these to /etc/sysctl.conf.
  5. The following sysctl setting tells the OS to act as a router (i.e. forward packets): net.ipv4.ip_forward = 1
  6. Use /etc/resolv.conf to specify name servers, domain suffix (via the domain keyword), and list of search suffixes (via the search keyword with a space separated list of domains). Domain suffix is the domain name to be added to hosts without a domain name. Search suffixes are additional domain names that can be added to hosts when searching. Use only one of these. If both are specified the last one wins.
  7. Its best to put the Intranet DNS servers first in /etc/resolv.conf followed by Internet DNS servers. In my setup the Intranet servers don’t have access to the Internet and so any Internet name queries to them timeout. When this happens the DNS resolver will automatically try the next server in the list … up to when it reaches the Internet servers and get an answer. To speed the process up use option timeout:1 to set a timeout of 1 second. In contrast if I put the Internet servers first the don’t timeout – they try to resolve the non-existent domain and reply that it does not exist – so the Intranet servers aren’t queried.

Connecting VMware Workstation guests to the external world

This is just a writeup on what I am doing in my test lab setup. Mainly as a reminder to me.

My host laptop has an interface with a dynamically assigned private IP from my router. This laptop runs VMware Workstation, in which there are a couple of guests.

The easiest option would be to assign a NAT interface to each of these guests. That way they are all on a private network of their own, but with Internet access via VMware NAT.

A note on NAT. What is it? NAT is a way to hide private IP blocks from the outside world. Like in this instance, you don’t want to assign each of your guests public IPs as they are private machines and it’s a waste of public IPs. What you want is to assign these machines a private IP yet ensure they can connect to the outside world. So what you need is for one machine – the host in this case – to be on the Internet via a public IP or private IP (in which case the router that provides the private IP is hiding the host behind it) and all other machines to be hidden behind it.

The way NAT works is that it creates a private IP space on the host. The host is given an IP address from this private space, as are all the guests. The guests use this private IP of the host as their router, and the NAT service on the host receives packets, sends them out to the Internet but changes their source IP to be that of the external IP of the host and keeps note of this, and when it receives packets it reads them to identify which guest it is directed to and passes it on.

In VMware NAT, it is optional to give the host an IP from the private space as the NAT service creates a separate private IP – hidden from the host too, but residing on the host – and the guests use this IP as their router IP. This IP also provides DNS services for the guests. Here’s a screenshot from the network settings page of VMware Workstation.

nat-network

I have defined a virtual network called VMnet1 and assigned it as a NAT network. In this case I chose to have the host too connected to this network – and so my host has a VMnet1 interface with IP address 192.168.126.1 – and my VMs will have DHCP addresses assigned from this pool.

From one of my VMs:

Notice the default gateway is 192.168.126.2 and not 192.168.126.1. Just to confirm they are different interfaces, we can check the MAC address table:

Different MACs. Both virtual adapters (based on the 00-50-56 prefix, which belongs to VMware) but different adapters.

The VMs see the DHCP server too as another virtual adapter on the network. In my case the DHCP addresses are offered from 192.168.126.254 – yet another virtual adapter on the host that’s not visible to the host.

With VMware Workstation, I don’t have to create a separate network for NAT. I could have assigned each interface of the VMs as NAT and that too would do the trick, but creating a separate network has its advantages. I can control the range of IP addresses assigned to the guests, and I can have the host too assigned an interface from this pool. I can also choose the gateway address to be something else – say to be the IP of the host.

(Originally this post was meant to have more details but I got busy with other stuff. Rather than let it go to waste I am publishing it anyways. Hence the abrupt ending).

Using netsh to set Static/ DHCP addresses

Place where I work, our desktops are locked down so regular users have no access to the network settings. And since I am always logged in as a regular user and only use my admin account via RunAs (because it’s best practice not to be logged in with your admin account) I don’t have access to the network settings window either. Of course I could logout-login, but who does that!?

Enter netsh (or PowerShell if you are on Windows 8, but I am on Windows 7 at work).

To show your currently assigned IPv4 addresses and interfaces:

To set a Static IPv4 address for the “Local Area Connection” interface:

And to set a DHCP IPv4 address for the “Local Area Connection” interface:

Easy peasy! (sort of)

A cool thing about netsh is that it’s an interactive shell so you can type netsh at the command prompt to enter the shell and then navigate around to get a list of commands and slowly figure your way out.

NIC teaming in Windows Server 2012

Windows Server 2012 includes NIC teaming as part of the OS now so there’s no need for additional software. I find that pretty cool. It’s quite easy to setup too. You can do via the Server Manager GUI or use PowerShell. Just one PowerShell cmdlet, that’s it!

So what is NIC teaming?

Scenario 1: Say your server has one network card (NIC) and is connected to one switch port. If the NIC fails for any reason, your server is kaput – offline! Similarly if the switch fails for any reason, your server is kaput.

Scenario 2: Say your server has two 1GbE NICs. That is, two NICs that supports 1Gbps bandwidth each. And say your server runs two network intensive applications. Both applications use 1Gbps bandwidth each. Even though you have two NICs of 1Gbps each, you can’t use them together and so there’s no way to balance your two applications over the two NICs. You are stuck with just 1Gbps bandwidth.

These are two areas where NIC teaming can help.

NIC teaming simply means you combine your NICs into one team. (Windows Server 2012 can combine 32 such NICs into one team. And these NICs can be from different hardware providers too!) Before Windows Server 2012 you needed third party software to do NIC teaming – often from the network card manufacturer. But if you have Windows Server 2012 this is built into the OS.

The two usage scenerios above are called Link Balancing (LB) and FailOver (FO) – hence another name for NIC teaming is LBFO.

Link Balancing means you team your NICs into one and traffic is balanced between the two NICs. If you had two 1GbE NICs, the net effect is not as though you had a 2GbE NIC. No, the NICs are still independent in terms of bandwidth, but since you can use both NICs you are able to get more bandwidth. If you have an application on your server communcating with some client, all traffic between them related to a particular “flow” goes over the same NIC. But traffic to other clients can go over the other NIC, and also traffic to the same client but unrelated to the prior flow can go over the other NIC. There is a reason for this. If traffic for a particular flow went over both NICs, you could have packets appearing out of order (maybe one NIC has a faster path than the other) and that leads to an overhead of reassembling these packets.

Failover means you team your NICs into one and if one NIC fails traffic isn’t interrupted as it goes through the other NIC. This also helps in cases where the switch that a NIC is connected to, or the cable between them, goes down. When that NIC goes offline, other NICs will take over.

Now, NIC teaming isn’t a one sided job. Although the NICs may be teamed, as far as the network switch(es) they connect to is(are) concerned, they are still disparate. The switch(es) see one IP address, associated with multiple MAC addresses – on different ports/ switch – and so you need to involve the switch(es) too in your plan of NIC teaming. That said, you may choose to not involve the switches too if you want, but that has its own implications.

If you decide to involve the switch, there are two ways of going about with the teaming. One is a static way in which your switch admin designates the ports that the multiple NICs connect to as part of a team, and then the switch will take care of things. This has a drawback in that if you move the NICs to any other switch port, teaming breaks unless the network admin reconfigures the new ports on the switch.

The other way is an automatic way, in which if your switch(es) support the Link Aggregration Control Protocol (LACP; usually referred to also as IEEE 802.3ad or 802.1ax) you can use any ports on the switch and the ports will automatically figure out they are part of a team. Of course, if you decide to use LACP you have to use LACP switches and also tell the bits in the OS responsible for teaming that the switch knows LACP so it can behave accordingly. LACP works by sending some frames (called LACPDUs) from NIC to the switch to determine which ports are part of the same team and then combining these ports into a single link.

It’s important to note here that teaming which involves the switch, usually called “switch dependent” teaming, requires all the NICs to the be connected to the same switch (but don’t quote me on this as I am not a networks guy and this is just my understanding!).

If you decide not to involve the switch, usually called “switch independent” teaming, you can use multiple switches as the switches don’t know anything and all the action happens at the NIC end. This has some implications though, in that traffic from the switch to your teamed NICs will not be distributed as the switch doesn’t know of the multiple NICs. As far as the switch(es) knows your teamed NIC interface has one MAC address – that of one of the NICs – and so that NIC is the one which will get all the incoming traffic.

(As an aside, this is why if we decide to take a NIC out of the team and use it independently, it is a good idea to disable the team and enable. If the NIC we are taking out has the same MAC address as the team, it won’t work because the switch won’t be aware of it. Disabling and enabling the team will cause the teamed NIC to pick up a new MAC address from the remaining NICs).

When setting up NIC teaming in Windows Server 2012 you can choose from the options above.

Apart from deciding how the teaming is setup, we have to also select what algorithm is used for load balancing (the LB in LBFO). Windows Server 2012 gives you 2 options if you use the GUI, and 5 options if you use PowerShell. Let’s look at these now.

HyperVPort

  • This is used in case your Windows Server 2012 box functions as the host for a bunch of VMs. Each VM only sees one NIC – the teamed NIC – but internally, the host assigns each VM to a particular NIC. (Say you have two NICs in the team, and 5 VMs. VMs 1,3,5 could be on NIC 1, VMs 2,4 could be on NIC 2).
  • Obviously, selecting such a load balancing algorithm means any particular VM is always limited by the bandwidth of the NIC it is internally assigned to. It can’t aggregate over the other NICs to increase its bandwidth.
  • If you do switch independent teaming, the switch will send incoming traffic for each VM to the NIC assigned to it as the switch is aware these are different VMs. If you do switch dependent teaming, the incoming traffic is anyways distributed.

Address Hashing

  • This option creates a hash based on various components of the packet, and assigns all packets with that hash to a particular NIC. The components chosen for the hash vary. You can choose to use the source and destination IP addresses and TCP & UDP ports. You can choose to use the source and destination IP address. Or you can choose to use the source and destination MAC address. (If you choose the first option, for instance, and the packet is say an IPSec packet and hence has no TCP & UDP port, the second option is automatically used. Similarly if you choose the second option and that cannot be used for any reason, the last option is used).
  • If you do switch independent teaming, since the NIC used keeps varying the teamed NIC uses the MAC address of one of the NICs and that’s the MAC address the switch(es) the NICs connect to is aware of. So all incoming traffic from the switch to NICs is sent to that particular NIC whose MAC address is used. If you do switch dependent teaming, the incoming traffic is anyways distributed.
  • When using the GUI, the default option is the first one (use source and destination IP addresses and TCP & UDP ports).
  • When using PowerShell too, the default option is the first one (it is called TransportPorts). The second option is called IPAddresses and the third option is called MacAddresses.

Dynamic

I couldn’t find much information on this. This is an option available only in PowerShell (it is called Dynamic) and it seems to be an algorithm that automatically load balances between NICs depending on their load. When using switch independent teaming, incoming traffic is not distributed and is sent to a particular NIC; when using switch dependent teaming incoming traffic is distributed.

That’s all!

Initial Server 2012 Core configuration using PowerShell

Just for future reference to myself.

  1. Install Server Core 2012 as usual.
  2. Login. Change the IP address thus:
  3. Specify DNS server addresses:
  4. Rename the computer:
  5. Restart the computer so the name change takes effect.
  6. Update: Rename the computer & join the domain (thanks to Daniel Streefkerf (check out his blog, if you like my blog posts you’ll surely enjoy his!) for pointing this out – since PowerShell 3.0 the Add-Computer cmdlet has a -NewName switch that lets you rename and add/ move a computer):

Update: If you want to do NIC teaming, do the following after step 1.

Now continue with assigning IP address etc, but use the newly created Team adapter instead.