Iptables packet flow (and various others bits and bobs)

I made a diagram of the packet flow through the various Iptables tables and chains. I keep Googling on what table to use as there’s so many of them and it’s a bit confusing to me so I figured it’s time I put it all down somewhere.

You can find a “live” version of this diagram at this URL from Lucidchart. Here’s a picture of the same (click to see a larger version):

I don’t want to take too much credit for creating this. It’s based on what I learnt from the excellent Iptables tutorial.

Now for the various bits and bobs:

Raw table

This is used for just one thing – to set a mark on packets to say they should not be handled by the connection tracking system (i.e. don’t keep note of their state; don’t act as a stateful firewall on these packets). This is done using the NOTRACK target. The Raw table has only PREROUTING and OUTGOING chains as it’s only used just as a packet enters the host or is generated by the host to leave it.

Mangle table

This table is used to mangle (modify) packets. It is primarily used for changing these fields in a packet and is not to be used for filtering or any sort of NATing. It has the following actions/ targets:

TOS (Type of Service)
TTL (Time To Live)
MARK (Can be used by the iproute2 programs; remember my earlier post on WireGuard & Tailscale?)
- CONNMARK (see below)
SECMARK (Security Context Marks; used by SELinux)
CONNSECMARK (used to copy the Security Context from/ to a single packet to/ from the whole connection; again, used by SELinux)

This table has chains everywhere. You can mangle during PREROUTING (as a packet enters the host), OUTPUT (or for a packet generated by the host), INPUT (a packet destined for this host), FORWARD (a packet being forwarded via this host), and POSTROUTING (as a packet leaves the host).

Example:

## mark packets to port 22
iptables -t mangle -A PREROUTING -p tcp --dport 22 -j MARK --set-mark 2

1 2	## mark packets to port 22 iptables -t mangle -A PREROUTING -p tcp --dport 22 -j MARK --set-mark 2

Note: There’s a CONNMARK action too, which is not limited to the Mangle table. CONNMARK associates marks with the connection rather than a packet, and it sounds like when in doubt one should use CONNMARK. CONMARK is available to all tables (including the Mangle table) and similar to the CONNSECMARK action it can set, save, restore marks between connections. An example from this blog post:

## restore the packet's mark from that of the connection
iptables -A POSTROUTING -t mangle -j CONNMARK --restore-mark

## if the packet's mark is not 0 accept it
iptables -A POSTROUTING -t mangle -m mark ! --mark 0 -j ACCEPT

## once accepted, set a mark of 1 or 2 on the packet depending on the destination port
iptables -A POSTROUTING -p tcp --dport 21 -t mangle -j MARK --set-mark 1
iptables -A POSTROUTING -p tcp --dport 80 -t mangle -j MARK --set-mark 2

## now save this mark to the connection itself
iptables -A POSTROUTING -t mangle -j CONNMARK --save-mark

## restore the packet's mark from that of the connection

iptables -A POSTROUTING -t mangle -j CONNMARK --restore-mark

## if the packet's mark is not 0 accept it

iptables -A POSTROUTING -t mangle -m mark ! --mark 0 -j ACCEPT

## once accepted, set a mark of 1 or 2 on the packet depending on the destination port

iptables -A POSTROUTING -p tcp --dport 21 -t mangle -j MARK --set-mark 1

iptables -A POSTROUTING -p tcp --dport 80 -t mangle -j MARK --set-mark 2

## now save this mark to the connection itself

iptables -A POSTROUTING -t mangle -j CONNMARK --save-mark

I am not a 100% clear on the difference between MARK and CONNMARK. I think 1) the MARK action modifies the actual packet, while CONNMARK is an association within the kernel between a connection and its mark (which can be explictly set or set from its packets) and that’s why you’d use MARK to actually set a mark and 2) rather than then MARK each packet of a connection individually it is easier to MARK the first one and then use CONNMARK to set the same for all packets in the connection. I got this impression from this StackOverflow post.

NAT table

This table is only used to perform NAT operations. Only the first packet in a stream of packets hits this table, and al subsequent packets of that stream follow whatever is determined with the first packet. You can do the following:

SNAT (what we typically think of with NAT – you have a public IP, behind which is a network of private IPs, you want the machines in this private IP network to be able to access the public Internet but since their IPs are private these networks are not routable and so you hide them all behind the public IP; as far as the outside world is concerned all traffic of all the machines in your private network originate from/ are destined to this public IP) (SNAT == Source NAT; source being your private network from where traffic originates and whose real source IP gets changed by this table)
- MASQUERADE (a special case of SNAT wherein you don’t need to specify the public IP, Iptables will use the source address of the outgoing interface; useful when your public IP could change and you don’t want to specify it)
DNAT (the opposite of the above; you have some server in your private IP space and since this is not routable on the Internet you want requests hitting your public IP to be sent to this server instead, so you are rewriting the destination address whereas with SNAT you are rewriting the source address; this is not the same as port forwarding but you can use DNAT to do port forwarding too)
- REDIRECT (this was new to me and perhaps something I could have used previously; this is a special case of DNAT in that it redirects packets and streams to the localhost itelf, to the same port or to a different one)

Some examples might be useful (most of these are from the HOWTO so it’s mainly for the reference of my future self):

SNAT & MASQUERADE

## Change source addresses to 1.2.3.4.
iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to 1.2.3.4

## Change source addresses to 1.2.3.4, 1.2.3.5 or 1.2.3.6
iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to 1.2.3.4-1.2.3.6

## Change source addresses to 1.2.3.4, ports 1-1023
iptables -t nat -A POSTROUTING -p tcp -o eth0 -j SNAT --to 1.2.3.4:1-1023

## Masquerade everything out ppp0.
iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE

## Change source addresses to 1.2.3.4.

iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to 1.2.3.4

## Change source addresses to 1.2.3.4, 1.2.3.5 or 1.2.3.6

iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to 1.2.3.4-1.2.3.6

## Change source addresses to 1.2.3.4, ports 1-1023

iptables -t nat -A POSTROUTING -p tcp -o eth0 -j SNAT --to 1.2.3.4:1-1023

## Masquerade everything out ppp0.

iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE

Note that SNAT & MASQUERADE are always in the POSTROUTING chain. Because it happens as the packets are exiting the machine. It is the last step before the packets exit.

DNAT & REDIRECT

## Change destination addresses to 5.6.7.8
iptables -t nat -A PREROUTING -i eth0 -j DNAT --to 5.6.7.8

## Change destination addresses to 5.6.7.8, 5.6.7.9 or 5.6.7.10.
iptables -t nat -A PREROUTING -i eth0 -j DNAT --to 5.6.7.8-5.6.7.10

## Change destination addresses of web traffic to 5.6.7.8, port 8080.
iptables -t nat -A PREROUTING -p tcp --dport 80 -i eth0 \
        -j DNAT --to 5.6.7.8:8080

## Send incoming port-80 web traffic to our squid (transparent) proxy
iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 80 \
        -j REDIRECT --to-port 3128

## Send incoming SSH traffic to a specific IP to a different port
iptables -t nat -A PREROUTING -p tcp -d 192.168.17.21 --dport 22 \
        -j REDIRECT --to-port 2222

## Change destination addresses to 5.6.7.8

iptables -t nat -A PREROUTING -i eth0 -j DNAT --to 5.6.7.8

## Change destination addresses to 5.6.7.8, 5.6.7.9 or 5.6.7.10.

iptables -t nat -A PREROUTING -i eth0 -j DNAT --to 5.6.7.8-5.6.7.10

## Change destination addresses of web traffic to 5.6.7.8, port 8080.

iptables -t nat -A PREROUTING -p tcp --dport 80 -i eth0 \

-j DNAT --to 5.6.7.8:8080

## Send incoming port-80 web traffic to our squid (transparent) proxy

iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 80 \

-j REDIRECT --to-port 3128

## Send incoming SSH traffic to a specific IP to a different port

iptables -t nat -A PREROUTING -p tcp -d 192.168.17.21 --dport 22 \

-j REDIRECT --to-port 2222

Note that DNAT & REDIRECT happen in the PREROUTING chain, before any filtering but after any actions by the Raw and Mangle tables.

Filter table

This is what you’d typically expect to do with a firewall – filter packets. ACCEPT, DROP, or REJECT them depending on our filters.

You can filter on the INPUT chain (packets destined for this host), OUTPUT chain (packets generated by this host), or FORWARD chain (packets being routed via this host).

User defined chains

Although I kept refering to the standard chains above, it is also possible to have user-defined chains. For example: Docker adds all its rules to a DOCKER chain (rules created by Docker) and a DOCKER-USER chain (Docker related rules a user can add).

Syntax of Iptables

iptables [-t table] command [match] [target/jump]

1	iptables [-t table] command [match] [target/jump]

We know what a table is from above. Here’s a list of the available commands:

Rule commands:
- -A, –append: to append a rule to the end of the chain (e.g. -A INPUT)
- -I, –insert: to insert a rule at the specified number (e.g. -I INPUT 1 --dport 80 -j ACCEPT)
- -D, –delete: to delete (either by specifying the whole rule, or a the number of the rule starting with the first rule in the chain being number 1)
- -R, –replace: to replace the rule at the specific line
- -L, –list: to list all rules in the specified chain; or if no chain is specified list all rules in that table (default being the filter table; specify a table with the -t switch as in the syntax)
- -F, –flush: to flush/ delete all rules in the specified chain
- -N, –new-chain: to create a new chain of the specified name
Chain commands:
- -X, –delete-chain: to delete the specified chain (this requires all rules no other rules to be referring to this chain)
- -E, –rename-chain: to rename the specified chain
- -P, –policy: to set the default policy for that chain (could be DROP or ACCEPT)

Note: Some of these commands take options of their own which I am skipping here.

Am not going into the details of the matches or targets/ jumps either as that’s a bucketful in itself. I would like to add though that with matches one has the usual suspects like matching based on a protocol (-p, –protocol), source address (-s, –src, –source), destination address (-d, –dst, –destination), incoming interface (-i, –in-interface), outgoing interface (-o, –out-interface), fragments (-f, –fragment); or matching based on various Iptables extensions (this is a shorter list). The latter makes Iptables very powerful, as for instance one can match all packets generated by a process run under a specific uid/ guid:

## match packets 
iptables -A OUTPUT -m owner --uid-owner 500

1 2	## match packets iptables -A OUTPUT -m owner --uid-owner 500

Modules might not be well documented and have some not obvious limitations (e.g. for the owner module above).