Tailscale & WireGuard co-existing (or: I love policy based routing!)

I had Tailscale running on my Raspbery Pi already and earlier today I also installed WireGuard on it. After a reboot though I couldn’t access my Tailscale devices any more.. which got me looking into how Tailscale & WireGuard routing works. After another a reboot the routing started working again (and seems to stick on after multiple test reboots) but I thought I’d write a post anyways on what I learnt.

Let’s start with routing tables. Here’s my routing table currently:

$ ip route
default via 192.168.1.1 dev eth0 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.17.58

$ ip route

default via 192.168.1.1 dev eth0 onlink

172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.17.58

I have both TailScale and WireGuard running but notice there’s no default route entry for either of these as you’d expect. I was hoping to see some entries for my various WireGuard IPs via the tailscale0 interface for instance, but nada. That’s because the above routing table is only half the story. This comes under the old way of doing things… based on a destination address; while what we have nowadays is policy based routing, wherein we route not just based on the destination address but also other factors. The way to manage this on Linux is via the ip rule command.

Before getting to ip rule though here’s a variant of ip route that shows all the routing tables. By default what ip route shows is your main table whereas add a show table all shows all the tables.

$ ip route show table all
100.84.97.108 dev tailscale0 table 52 scope link
100.85.80.11 dev tailscale0 table 52 scope link
default dev mullvad-nl2 table 51820 scope link
default via 192.168.1.1 dev eth0 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.58
local 10.69.9.239 dev mullvad-nl2 table local proto kernel scope host src 10.69.9.239
local 100.109.195.28 dev tailscale0 table local proto kernel scope host src 100.109.195.28
...

$ ip route show table all

100.84.97.108 dev tailscale0 table 52 scope link

100.85.80.11 dev tailscale0 table 52 scope link

default dev mullvad-nl2 table 51820 scope link

default via 192.168.1.1 dev eth0 onlink

172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.58

local 10.69.9.239 dev mullvad-nl2 table local proto kernel scope host src 10.69.9.239

local 100.109.195.28 dev tailscale0 table local proto kernel scope host src 100.109.195.28

...

Take a look at lines 3, 4, and 5: they have a table word in them which tells you that these rules are part of a separate table. You can see the tailscale routing is in table 52, while the wireguard routing is in table 51820 (I use a config file from Mullvad that’s why the interface name is mullvad-nl2 rather than wg0 as usual). So if via some policy you happen to come under table number 51820 all your traffic goes via the mullvad-nl2 interface as that’s the default gateway. This table doesn’t know anything about my Tailscale network; those are in table 52 and each of my Tailscale network devices are listed as going via the tailscale0 interface. Nice!

So how does one know which table to use? That’s where ip rule comes into the picture. Here’s the output on my device:

$ ip rule
0:      from all lookup local
5210:   from all fwmark 0x80000 lookup main
5230:   from all fwmark 0x80000 lookup default
5250:   from all fwmark 0x80000 unreachable
5270:   from all lookup 52
32764:  from all lookup main suppress_prefixlength 0
32765:  not from all fwmark 0xca6c lookup 51820
32766:  from all lookup main
32767:  from all lookup default

$ ip rule

0: from all lookup local

5210: from all fwmark 0x80000 lookup main

5230: from all fwmark 0x80000 lookup default

5250: from all fwmark 0x80000 unreachable

5270: from all lookup 52

32764: from all lookup main suppress_prefixlength 0

32765: not from all fwmark 0xca6c lookup 51820

32766: from all lookup main

32767: from all lookup default

The first number (e.g. 5270) is the priority of the rule. The smaller the number, the higher its priority. Each entry is evaluated by the routing engine (from lowest to highest number) and if any entry matches and results in a route to take the subsequent entries are not checked. If an entry matches but doesn’t return with a route then subsequent entries are checked.

By default there are only 3 rules – priority numbers 0, 32766, and 32767; everything else was added by something else on my system. These default rules do what you’d expect as default behaviour – they make use of the local table for traffic to the 127.0.0.0/8 network and the main table for everything else.

The format after the priority is a selector and an action to perform on a packet matching the selector. The selector is how we match an entry basically. For example if we take rule 0 it has from all as its selector – so, match all packets – and its action is lookup local – i.e. lookup a table called local. Here’s a list of all the rules on my machine in this local table:

$ ip route show table all | grep "table local"
local 10.69.9.239 dev mullvad-nl2 table local proto kernel scope host src 10.69.9.239
local 100.109.195.28 dev tailscale0 table local proto kernel scope host src 100.109.195.28
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1
broadcast 172.17.0.0 dev docker0 table local proto kernel scope link src 172.17.0.1 linkdown
local 172.17.0.1 dev docker0 table local proto kernel scope host src 172.17.0.1
broadcast 172.17.255.255 dev docker0 table local proto kernel scope link src 172.17.0.1 linkdown
broadcast 192.168.1.0 dev eth0 table local proto kernel scope link src 192.168.1.58
local 192.168.1.58 dev eth0 table local proto kernel scope host src 192.168.1.58
broadcast 192.168.1.255 dev eth0 table local proto kernel scope link src 192.168.1.58
local ::1 dev lo table local proto kernel metric 0 pref medium
local fe80::64b1:df4d:6505:b4d5 dev tailscale0 table local proto kernel metric 0 pref medium
local fe80::786d:d0ff:fedd:c138 dev macvlan0 table local proto kernel metric 0 pref medium
local fe80::dea6:32ff:fec7:b083 dev eth0 table local proto kernel metric 0 pref medium
ff00::/8 dev mullvad-nl2 table local metric 256 pref medium
ff00::/8 dev tailscale0 table local metric 256 pref medium
ff00::/8 dev eth0 table local metric 256 pref medium
ff00::/8 dev macvlan0 table local metric 256 pref medium

$ ip route show table all | grep "table local"

local 10.69.9.239 dev mullvad-nl2 table local proto kernel scope host src 10.69.9.239

local 100.109.195.28 dev tailscale0 table local proto kernel scope host src 100.109.195.28

broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1

local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1

local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1

broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1

broadcast 172.17.0.0 dev docker0 table local proto kernel scope link src 172.17.0.1 linkdown

local 172.17.0.1 dev docker0 table local proto kernel scope host src 172.17.0.1

broadcast 172.17.255.255 dev docker0 table local proto kernel scope link src 172.17.0.1 linkdown

broadcast 192.168.1.0 dev eth0 table local proto kernel scope link src 192.168.1.58

local 192.168.1.58 dev eth0 table local proto kernel scope host src 192.168.1.58

broadcast 192.168.1.255 dev eth0 table local proto kernel scope link src 192.168.1.58

local ::1 dev lo table local proto kernel metric 0 pref medium

local fe80::64b1:df4d:6505:b4d5 dev tailscale0 table local proto kernel metric 0 pref medium

local fe80::786d:d0ff:fedd:c138 dev macvlan0 table local proto kernel metric 0 pref medium

local fe80::dea6:32ff:fec7:b083 dev eth0 table local proto kernel metric 0 pref medium

ff00::/8 dev mullvad-nl2 table local metric 256 pref medium

ff00::/8 dev tailscale0 table local metric 256 pref medium

ff00::/8 dev eth0 table local metric 256 pref medium

ff00::/8 dev macvlan0 table local metric 256 pref medium

Notice its got individual entries but no default gateway. So if I have a packet for say 8.8.8.8 it would match this selector but since the table it looks up doesn’t have an entry for this subnet or the default route, the next rule will be checked.. and so on until something matches (or doesn’t). In my case I have just two tables that have an entry for the default network: table 51820 (WireGuard) and the main table (which doesn’t have a table name in the output below).

$ ip route show table all | grep "default"
default dev mullvad-nl2 table 51820 scope link
default via 192.168.1.1 dev eth0 onlink

$ ip route show table all | grep "default"

default dev mullvad-nl2 table 51820 scope link

default via 192.168.1.1 dev eth0 onlink

The rule sending me to table 51820 has a lower number/ higher priority of 32765 than the rule sending me to the main table and so WireGuard ends up being the default route for all my packets.

What happens though if I am trying to access a Tailscale network IP address? Say 100.85.80.11. Looking at the results of ip rule one of the entries has priority 5270 (much higher than that of WireGuard) and sends me to table 52. What does that table contain?

$ ip route show table all | grep "table 52"
100.84.97.108 dev tailscale0 table 52 scope link
100.85.80.11 dev tailscale0 table 52 scope link

$ ip route show table all | grep "table 52"

100.84.97.108 dev tailscale0 table 52 scope link

100.85.80.11 dev tailscale0 table 52 scope link

There we go! An entry matching the IP I want to talk to. So any traffic for that IP will go via table 52 and out through the tailscale0 interface. Awesome!

So how does the traffic to WireGuard go? For that we need to look at these three rules:

32764:  from all lookup main suppress_prefixlength 0
32765:  not from all fwmark 0xca6c lookup 51820
32766:  from all lookup main

32764: from all lookup main suppress_prefixlength 0

32765: not from all fwmark 0xca6c lookup 51820

32766: from all lookup main

Rule 32764 says match all traffic and lookup the main table. But if any routing results have a result with prefix length of 0 or less, reject it (that’s what suppress_prefixlength 0 does). What’s an example of something that has a 0 length prefix? Why the default route ofcourse which is 0.0.0.0/0. So in effect this rules says use the main table, except for any default routes.

The next rule, number 32765, says match any traffic not having an fwmark of 0xca6c and send it to table 51820 whereas all other traffic goes to the main table. An fwmark is a way of marking an interface or packets. When WireGuard starts a tunnel it actually marks its WireGuard interface with an fwmark (see “Improved Rule-based Routing in this WireGuard page). So this means any traffic generated by that interface will not hit table 51820, while all other traffic goes to that. And if table 51820 has a default route entry (which it does – default dev mullvad-nl2 table 51820 scope link in my case) that means all traffic goes out via the WireGuard tunnel.

Why do we not want traffic from the WireGuard interface to match this table though? Because traffic from that interface has to go via our actual default gateway… that’s how we will establish the tunnel and actually send traffic between the two endpoints. The traffic from the WireGuard interface will thus match rule 32766 and be sent to the main table which has a default gateway entry and thus be sent out to the Internet. Nice, huh!

So this is how WireGuard and TailScale can co-exist and also how routing works for these applications. The key thing is that the TailScale rule has a a lower number/ higher priority than the WireGuard one and so it always handles traffic for any of my TailScale IPs. And that in turns brings me to why I think I couldn’t get them co-existing initially. I had checked out ip rule then and here’s the output:

$ ip rule
0:      from all lookup local
5208:   from all lookup main suppress_prefixlength
5209:   not from all fwmark 0xca6c lookup 51820
5210:   from all fwmark 0x80000 lookup main
5230:   from all fwmark 0x80000 lookup default
5250:   from all fwmark 0x80000 unreachable
5270:   from all lookup 52
32766:  from all lookup main
32767:  from all lookup default

$ ip rule

0: from all lookup local

5208: from all lookup main suppress_prefixlength

5209: not from all fwmark 0xca6c lookup 51820

5210: from all fwmark 0x80000 lookup main

5230: from all fwmark 0x80000 lookup default

5250: from all fwmark 0x80000 unreachable

5270: from all lookup 52

32766: from all lookup main

32767: from all lookup default

Notice that the WireGuard rule has a higher priority than Tailscale… so all my Tailscale IPs would be matching rule 5209 and trying to go out via my WireGuard VPN tunnel.

Am not sure why the reboot (and subsequent test reboots) changed this order such that Tailscale has a higher priority. I wonder if it’s a case of which service comes up first? Maybe if the Tailscale daemon is started first it adds a lower number rule and then WireGuard adds a higher number one and so things are fine; while if the startup order is reversed they conflict? Looking at the output of systemctl status I see that the past few reboots tailscaled actually started a second before WireGuard… which is quite a close call actually, so maybe I’ve just been lucky?

Am going to leave things as they are for now but if I bump into this again I’ll just modify /lib/systemd/system/tailscaled.service to add something like Before=wg-quick@mullvad-nl2.service (update: see below) under the [Unit] section to ensure systemd brings Tailscale up before WireGuard.

Update

I got hit by this again so I did the following:

sudo systemctl edit tailscaled.service

1	sudo systemctl edit tailscaled.service

This opens up an empty file in your editor. To this I added:

[Unit]
After=wg-quick@<replace>.service

1 2	[Unit] After=wg-quick@<replace>.service

Then do a reload (I think this step can be skipped as the edit also reloads but I did it anyways):

sudo systemctl daemon-reload

1	sudo systemctl daemon-reload

After this I rebooted and this should ensure Tailscale only starts after WireGuard.

What I did above was add an override to the Tailscale service. Rather than edit the package provided unit file at /lib/systemd/system/tailscaled.service which could get changed during an update, I added an override file. The additions I did above can be seen at /etc/systemd/system/tailscaled.service.d/override.conf if you are interested.

Update 2

I realized 2 years later when I stumbled upon this blog post again that I had mistakenly used the After keyword instead of Before. Turns out that is the right thing to do; Before breaks things. So Tailscale has to start after Wireguard.