I had Tailscale running on my Raspbery Pi already and earlier today I also installed WireGuard on it. After a reboot though I couldn’t access my Tailscale devices any more.. which got me looking into how Tailscale & WireGuard routing works. After another a reboot the routing started working again (and seems to stick on after multiple test reboots) but I thought I’d write a post anyways on what I learnt.
Let’s start with routing tables. Here’s my routing table currently:
1 2 3 4 |
$ ip route default via 192.168.1.1 dev eth0 onlink 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.17.58 |
I have both TailScale and WireGuard running but notice there’s no default route entry for either of these as you’d expect. I was hoping to see some entries for my various WireGuard IPs via the tailscale0
interface for instance, but nada. That’s because the above routing table is only half the story. This comes under the old way of doing things… based on a destination address; while what we have nowadays is policy based routing, wherein we route not just based on the destination address but also other factors. The way to manage this on Linux is via the ip rule
command.
Before getting to ip rule
though here’s a variant of ip route
that shows all the routing tables. By default what ip route
shows is your main
table whereas add a show table all
shows all the tables.
1 2 3 4 5 6 7 8 9 10 |
$ ip route show table all 100.84.97.108 dev tailscale0 table 52 scope link 100.85.80.11 dev tailscale0 table 52 scope link default dev mullvad-nl2 table 51820 scope link default via 192.168.1.1 dev eth0 onlink 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.58 local 10.69.9.239 dev mullvad-nl2 table local proto kernel scope host src 10.69.9.239 local 100.109.195.28 dev tailscale0 table local proto kernel scope host src 100.109.195.28 ... |
Take a look at lines 3, 4, and 5: they have a table
word in them which tells you that these rules are part of a separate table. You can see the tailscale routing is in table 52, while the wireguard routing is in table 51820 (I use a config file from Mullvad that’s why the interface name is mullvad-nl2
rather than wg0
as usual). So if via some policy you happen to come under table number 51820 all your traffic goes via the mullvad-nl2
interface as that’s the default gateway. This table doesn’t know anything about my Tailscale network; those are in table 52 and each of my Tailscale network devices are listed as going via the tailscale0
interface. Nice!
So how does one know which table to use? That’s where ip rule
comes into the picture. Here’s the output on my device:
1 2 3 4 5 6 7 8 9 10 |
$ ip rule 0: from all lookup local 5210: from all fwmark 0x80000 lookup main 5230: from all fwmark 0x80000 lookup default 5250: from all fwmark 0x80000 unreachable 5270: from all lookup 52 32764: from all lookup main suppress_prefixlength 0 32765: not from all fwmark 0xca6c lookup 51820 32766: from all lookup main 32767: from all lookup default |
The first number (e.g. 5270) is the priority of the rule. The smaller the number, the higher its priority. Each entry is evaluated by the routing engine (from lowest to highest number) and if any entry matches and results in a route to take the subsequent entries are not checked. If an entry matches but doesn’t return with a route then subsequent entries are checked.
By default there are only 3 rules – priority numbers 0, 32766, and 32767; everything else was added by something else on my system. These default rules do what you’d expect as default behaviour – they make use of the local
table for traffic to the 127.0.0.0/8 network and the main
table for everything else.
The format after the priority is a selector and an action to perform on a packet matching the selector. The selector is how we match an entry basically. For example if we take rule 0 it has from all
as its selector – so, match all packets – and its action is lookup local
– i.e. lookup a table called local
. Here’s a list of all the rules on my machine in this local
table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
$ ip route show table all | grep "table local" local 10.69.9.239 dev mullvad-nl2 table local proto kernel scope host src 10.69.9.239 local 100.109.195.28 dev tailscale0 table local proto kernel scope host src 100.109.195.28 broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 broadcast 172.17.0.0 dev docker0 table local proto kernel scope link src 172.17.0.1 linkdown local 172.17.0.1 dev docker0 table local proto kernel scope host src 172.17.0.1 broadcast 172.17.255.255 dev docker0 table local proto kernel scope link src 172.17.0.1 linkdown broadcast 192.168.1.0 dev eth0 table local proto kernel scope link src 192.168.1.58 local 192.168.1.58 dev eth0 table local proto kernel scope host src 192.168.1.58 broadcast 192.168.1.255 dev eth0 table local proto kernel scope link src 192.168.1.58 local ::1 dev lo table local proto kernel metric 0 pref medium local fe80::64b1:df4d:6505:b4d5 dev tailscale0 table local proto kernel metric 0 pref medium local fe80::786d:d0ff:fedd:c138 dev macvlan0 table local proto kernel metric 0 pref medium local fe80::dea6:32ff:fec7:b083 dev eth0 table local proto kernel metric 0 pref medium ff00::/8 dev mullvad-nl2 table local metric 256 pref medium ff00::/8 dev tailscale0 table local metric 256 pref medium ff00::/8 dev eth0 table local metric 256 pref medium ff00::/8 dev macvlan0 table local metric 256 pref medium |
Notice its got individual entries but no default gateway. So if I have a packet for say 8.8.8.8 it would match this selector but since the table it looks up doesn’t have an entry for this subnet or the default route, the next rule will be checked.. and so on until something matches (or doesn’t). In my case I have just two tables that have an entry for the default network: table 51820 (WireGuard) and the main table (which doesn’t have a table name in the output below).
1 2 3 |
$ ip route show table all | grep "default" default dev mullvad-nl2 table 51820 scope link default via 192.168.1.1 dev eth0 onlink |
The rule sending me to table 51820 has a lower number/ higher priority of 32765 than the rule sending me to the main table and so WireGuard ends up being the default route for all my packets.
What happens though if I am trying to access a Tailscale network IP address? Say 100.85.80.11. Looking at the results of ip rule
one of the entries has priority 5270 (much higher than that of WireGuard) and sends me to table 52. What does that table contain?
1 2 3 |
$ ip route show table all | grep "table 52" 100.84.97.108 dev tailscale0 table 52 scope link 100.85.80.11 dev tailscale0 table 52 scope link |
There we go! An entry matching the IP I want to talk to. So any traffic for that IP will go via table 52 and out through the tailscale0
interface. Awesome!
So how does the traffic to WireGuard go? For that we need to look at these three rules:
1 2 3 |
32764: from all lookup main suppress_prefixlength 0 32765: not from all fwmark 0xca6c lookup 51820 32766: from all lookup main |
Rule 32764 says match all traffic and lookup the main
table. But if any routing results have a result with prefix length of 0 or less, reject it (that’s what suppress_prefixlength 0
does). What’s an example of something that has a 0 length prefix? Why the default
route ofcourse which is 0.0.0.0/0
. So in effect this rules says use the main
table, except for any default routes.
The next rule, number 32765, says match any traffic not having an fwmark
of 0xca6c
and send it to table 51820 whereas all other traffic goes to the main
table. An fwmark
is a way of marking an interface or packets. When WireGuard starts a tunnel it actually marks its WireGuard interface with an fwmark
(see “Improved Rule-based Routing in this WireGuard page). So this means any traffic generated by that interface will not hit table 51820, while all other traffic goes to that. And if table 51820 has a default route entry (which it does – default dev mullvad-nl2 table 51820 scope link
in my case) that means all traffic goes out via the WireGuard tunnel.
Why do we not want traffic from the WireGuard interface to match this table though? Because traffic from that interface has to go via our actual default gateway… that’s how we will establish the tunnel and actually send traffic between the two endpoints. The traffic from the WireGuard interface will thus match rule 32766 and be sent to the main
table which has a default gateway entry and thus be sent out to the Internet. Nice, huh!
So this is how WireGuard and TailScale can co-exist and also how routing works for these applications. The key thing is that the TailScale rule has a a lower number/ higher priority than the WireGuard one and so it always handles traffic for any of my TailScale IPs. And that in turns brings me to why I think I couldn’t get them co-existing initially. I had checked out ip rule
then and here’s the output:
1 2 3 4 5 6 7 8 9 10 |
$ ip rule 0: from all lookup local 5208: from all lookup main suppress_prefixlength 5209: not from all fwmark 0xca6c lookup 51820 5210: from all fwmark 0x80000 lookup main 5230: from all fwmark 0x80000 lookup default 5250: from all fwmark 0x80000 unreachable 5270: from all lookup 52 32766: from all lookup main 32767: from all lookup default |
Notice that the WireGuard rule has a higher priority than Tailscale… so all my Tailscale IPs would be matching rule 5209 and trying to go out via my WireGuard VPN tunnel.
Am not sure why the reboot (and subsequent test reboots) changed this order such that Tailscale has a higher priority. I wonder if it’s a case of which service comes up first? Maybe if the Tailscale daemon is started first it adds a lower number rule and then WireGuard adds a higher number one and so things are fine; while if the startup order is reversed they conflict? Looking at the output of systemctl status
I see that the past few reboots tailscaled
actually started a second before WireGuard… which is quite a close call actually, so maybe I’ve just been lucky?
Am going to leave things as they are for now but if I bump into this again I’ll just modify /lib/systemd/system/tailscaled.service
to add something like Before=wg-quick@mullvad-nl2.service
(update: see below) under the [Unit]
section to ensure systemd brings Tailscale up before WireGuard.
Update
I got hit by this again so I did the following:
1 |
sudo systemctl edit tailscaled.service |
This opens up an empty file in your editor. To this I added:
1 2 |
[Unit] After=wg-quick@<replace>.service |
Then do a reload (I think this step can be skipped as the edit
also reloads but I did it anyways):
1 |
sudo systemctl daemon-reload |
After this I rebooted and this should ensure Tailscale only starts after WireGuard.
What I did above was add an override to the Tailscale service. Rather than edit the package provided unit file at /lib/systemd/system/tailscaled.service
which could get changed during an update, I added an override file. The additions I did above can be seen at /etc/systemd/system/tailscaled.service.d/override.conf
if you are interested.
Update 2
I realized 2 years later when I stumbled upon this blog post again that I had mistakenly used the After
keyword instead of Before
. Turns out that is the right thing to do; Before
breaks things. So Tailscale has to start after Wireguard.