Tailscale & WireGuard co-existing (or: I love policy based routing!)

I had Tailscale running on my Raspbery Pi already and earlier today I also installed WireGuard on it. After a reboot though I couldn’t access my Tailscale devices any more.. which got me looking into how Tailscale & WireGuard routing works. After another a reboot the routing started working again (and seems to stick on after multiple test reboots) but I thought I’d write a post anyways on what I learnt.

Let’s start with routing tables. Here’s my routing table currently:

I have both TailScale and WireGuard running but notice there’s no default route entry for either of these as you’d expect. I was hoping to see some entries for my various WireGuard IPs via the tailscale0 interface for instance, but nada. That’s because the above routing table is only half the story. This comes under the old way of doing things… based on a destination address; while what we have nowadays is policy based routing, wherein we route not just based on the destination address but also other factors. The way to manage this on Linux is via the ip rule command.

Before getting to ip rule though here’s a variant of ip route that shows all the routing tables. By default what ip route shows is your main table whereas add a show table all shows all the tables.

Take a look at lines 3, 4, and 5: they have a table word in them which tells you that these rules are part of a separate table. You can see the tailscale routing is in table 52, while the wireguard routing is in table 51820 (I use a config file from Mullvad that’s why the interface name is mullvad-nl2 rather than wg0 as usual). So if via some policy you happen to come under table number 51820 all your traffic goes via the mullvad-nl2 interface as that’s the default gateway. This table doesn’t know anything about my Tailscale network; those are in table 52 and each of my Tailscale network devices are listed as going via the tailscale0 interface. Nice!

So how does one know which table to use? That’s where ip rule comes into the picture. Here’s the output on my device:

The first number (e.g. 5270) is the priority of the rule. The smaller the number, the higher its priority. Each entry is evaluated by the routing engine (from lowest to highest number) and if any entry matches and results in a route to take the subsequent entries are not checked. If an entry matches but doesn’t return with a route then subsequent entries are checked.

By default there are only 3 rules – priority numbers 0, 32766, and 32767; everything else was added by something else on my system. These default rules do what you’d expect as default behaviour – they make use of the local table for traffic to the 127.0.0.0/8 network and the main table for everything else.

The format after the priority is a selector and an action to perform on a packet matching the selector. The selector is how we match an entry basically. For example if we take rule 0 it has from all as its selector – so, match all packets – and its action is lookup local – i.e. lookup a table called local. Here’s a list of all the rules on my machine in this local table:

Notice its got individual entries but no default gateway. So if I have a packet for say 8.8.8.8 it would match this selector but since the table it looks up doesn’t have an entry for this subnet or the default route, the next rule will be checked.. and so on until something matches (or doesn’t). In my case I have just two tables that have an entry for the default network: table 51820 (WireGuard) and the main table (which doesn’t have a table name in the output below).

The rule sending me to table 51820 has a lower number/ higher priority of 32765 than the rule sending me to the main table and so WireGuard ends up being the default route for all my packets.

What happens though if I am trying to access a Tailscale network IP address? Say 100.85.80.11. Looking at the results of ip rule one of the entries has priority 5270 (much higher than that of WireGuard) and sends me to table 52. What does that table contain?

There we go! An entry matching the IP I want to talk to. So any traffic for that IP will go via table 52 and out through the tailscale0 interface. Awesome!

So how does the traffic to WireGuard go? For that we need to look at these three rules:

Rule 32764 says match all traffic and lookup the main table. But if any routing results have a result with prefix length of 0 or less, reject it (that’s what suppress_prefixlength 0 does). What’s an example of something that has a 0 length prefix? Why the default route ofcourse which is 0.0.0.0/0. So in effect this rules says use the main table, except for any default routes.

The next rule, number 32765, says match any traffic not having an fwmark of 0xca6c and send it to table 51820 whereas all other traffic goes to the main table. An fwmark is a way of marking an interface or packets. When WireGuard starts a tunnel it actually marks its WireGuard interface with an fwmark (see “Improved Rule-based Routing in this WireGuard page). So this means any traffic generated by that interface will not hit table 51820, while all other traffic goes to that. And if table 51820 has a default route entry (which it does – default dev mullvad-nl2 table 51820 scope link in my case) that means all traffic goes out via the WireGuard tunnel.

Why do we not want traffic from the WireGuard interface to match this table though? Because traffic from that interface has to go via our actual default gateway… that’s how we will establish the tunnel and actually send traffic between the two endpoints. The traffic from the WireGuard interface will thus match rule 32766 and be sent to the main table which has a default gateway entry and thus be sent out to the Internet. Nice, huh!

So this is how WireGuard and TailScale can co-exist and also how routing works for these applications. The key thing is that the TailScale rule has a a lower number/ higher priority than the WireGuard one and so it always handles traffic for any of my TailScale IPs. And that in turns brings me to why I think I couldn’t get them co-existing initially. I had checked out ip rule then and here’s the output:

Notice that the WireGuard rule has a higher priority than Tailscale… so all my Tailscale IPs would be matching rule 5209 and trying to go out via my WireGuard VPN tunnel.

Am not sure why the reboot (and subsequent test reboots) changed this order such that Tailscale has a higher priority. I wonder if it’s a case of which service comes up first? Maybe if the Tailscale daemon is started first it adds a lower number rule and then WireGuard adds a higher number one and so things are fine; while if the startup order is reversed they conflict? Looking at the output of systemctl status I see that the past few reboots tailscaled actually started a second before WireGuard… which is quite a close call actually, so maybe I’ve just been lucky?

Am going to leave things as they are for now but if I bump into this again I’ll just modify /lib/systemd/system/tailscaled.service to add something like Before=wg-quick@mullvad-nl2.service (update: see below) under the [Unit] section to ensure systemd brings Tailscale up before WireGuard.

Update

I got hit by this again so I did the following:

This opens up an empty file in your editor. To this I added:

Then do a reload (I think this step can be skipped as the edit also reloads but I did it anyways):

After this I rebooted and this should ensure Tailscale only starts after WireGuard.

What I did above was add an override to the Tailscale service. Rather than edit the package provided unit file at /lib/systemd/system/tailscaled.service which could get changed during an update, I added an override file. The additions I did above can be seen at /etc/systemd/system/tailscaled.service.d/override.conf if you are interested.

Update 2

I realized 2 years later when I stumbled upon this blog post again that I had mistakenly used the After keyword instead of Before. Turns out that is the right thing to do; Before breaks things. So Tailscale has to start after Wireguard.