This is an old revision of the document!
Table of Contents
Nftables - Netfilter and VPN/IPsec packet flow
In this article I like to explain how the packet flow through
Netfilter hooks looks like on a host which works as a VPN gateway
based on IPsec (Strongswan, tunnel-mode, IKEv2, ESP). I'll focus on Nftables
in favor of the older Iptables and regarding Strongswan I'll focus on the newer
vici interface (using swanctl
) in favor of the older stroke interface.
See also my other article which covers packet flow through Netfilter hooks in general.
IPsec short recap
A comprehensive recap on the topic IPsec would require a whole book. I'll merely provide a very short recap here, based on protocols and ports, to put my actual topic into context.
An IPsec based VPN has a “management channel”, which is the IKE (in this case here the IKEv2) protocol. It is responsible for bringing up and tearing down the VPN tunnel connection between two VPN endpoint hosts and negotiation of the Security Associations (SA
s) and thereby the security credentials and encryption keys. It is encapsulated in UDP and uses UDP port 500 and in case of NAT-traversal1) it dynamically switches to UDP port 4500 during IKE handshake. Its encapsulation looks like this:
|eth|ip|udp|ikev2| | IKEv2 packet on UDP/500 or UDP/4500. |
After IKE handshake has been successfully finished, the VPN tunnel between both endpoints thereby is “up” and
packets can travel through it. In case of tunnel-mode the IP packets which shall travel through the VPN tunnel are being encrypted and then encapsulated in packets of the so-called ESP
protocol2). The whole thing is then further encapsulated into another (an “outer”) IP
packet. The reason is that the VPN tunnel itself is merely a point-to-point connection between two VPN endpoints (thus one source and one destination IP address), but those endpoints are in that case VPN gateways which are used to connect entire subnets on both ends of the tunnel. Thus the source and destination IP addresses of the “payload” packets which travel through the VPN gateway need to kept independent from the source and destination IP addresses of the “outer” IP packet. Encapsulation then looks like this (example for a TCP connection):
|eth| |ip|tcp|payload| 3) | A “normal” packet which shall travel through the VPN tunnel, is encrypted and encapsulated like this while traversing the VPN gateway. |
|eth|ip|esp|ip|tcp|payload| 4) |
If Nat-traversal is active, then ESP
is additionally encapsulated in UDP
:
|eth| |ip|tcp|payload| | In case of Nat-traversal additional encapsulation in UDP (same port 4500 as for IKE is then used here). |
|eth|ip|udp|esp|ip|tcp|payload| |
IPsec Linux implementation
IPsec implementation in Linux consists of a userspace part and a kernel part. Nowadays the userspace part is represented by the StrongSwan5) suite and the kernel part is represented by the Xfrm framework, which is sometimes called the Netkey stack and is present in the kernel since v2.6. With the following image I like to show these components and how they interact in a simple block diagram style.
Stongswan
The essential part of Strongswan is the userspace daemon charon which implements IKEv1/IKEv2 and acts as the central “orchestrator” of IPsec-based VPN (the main active component) on the system.
It provides an interface to the user/administrator for configuration of IPsec on the system.
Actually, more precisely, it provides two different interfaces to do that:
One is the so-called Stroke interface. It provides means to configure IPsec
via two main config files /etc/ipsec.conf
and /etc/ipsec.secrets
.
This is the older of the two interfaces and it can be considered deprecated (however
it is still supported).
The other and newer one is the so-called Vici interface. It is an IPC mechanism,
which means the charon daemon listens on a Unix-domain socket and client tools
like Strongswans own cmdline tool swanctl
6)
can connect to it to configure IPsec.
This way of configuration is more powerful than the Stroke interface , because it
makes it easier for other tools to provide and adjust configuration dynamically
and event driven at any time.
However in many common IPsec setups the configuration is still simply being
supplied via a config files. When using Vici, the difference is merely, that
the config file(s) (mainly the file /etc/swanctl/swanctl.conf
) are not interpreted
by the charon daemon directly, but instead are interpreted by the cmdline tool swanctl
which then feeds this config into the charon daemon via the Vici IPC interface.
The charon daemon uses a Netlink socket as a communication channel into the kernel.
The xfrm framework
The so-called xfrm framework is a component within the Linux kernel. While
the userspace part (Strongswan) handles the overall IPsec orchestration and
runs the IKEv1/IKEv2 protocol to buildup/teardown VPNs, the kernel part
handles all what can be considered the “VPN payload”. It implements the
Security Association Database (SAD
) and the Security Policy
Database (SPD
)
This means the userspace
daemon charon feeds the actual IPsec Security Association (SA
)
instances and Security Policy (SP
) instances, which result from
configuration and from IKEv1/IKEv2 handshake into the kernel and the kernel
maintains and uses those to encrypt and decrypt the actual “payload” network
packets of the VPN.
You can use the iproute2 tool ip
as low-level admin tool to show
the SA
s and SP
s which are currently configured in the
databases inside the kernel:
#list SAs which are currently configured in the kernel ip xfrm state #list SPs which are currently configured in the kernel ip xfrm policy
The ip
tool uses the same means (Netlink socket) to communicate with
the kernel. You could also use it as a low-level config tool to
create/edit/delete SA
s and SP
s in the kernel, however in practice
you leave those duties to Strongswan.
hooks
Table
ICMP
echo-request
h1
→ h2
, r1
traversal
step | netfilter hook / xfrm | encapsulation | iif | oif | ip saddr | ip daddr |
---|---|---|---|---|---|---|
1 | prerouting | |eth| |ip|icmp| | eth0 | 10.0.1.100 | 10.0.2.100 |
|
2 | forward | |eth| |ip|icmp| | eth0 | eth1 | 10.0.1.100 | 10.0.2.100 |
3 | postrouting | |eth| |ip|icmp| | eth1 | 10.0.1.100 | 10.0.2.100 |
|
4 | xfrm lookup | |eth| |ip|icmp| | ||||
5 | xfrm encode | |eth|......|ip|icmp| | ||||
6 | output | |eth|ip|esp|ip|icmp| | eth1 | 3.0.0.1 | 5.0.0.1 |
|
7 | postrouting | |eth|ip|esp|ip|icmp| | eth1 | 3.0.0.1 | 5.0.0.1 |
ICMP
echo-reply
h2
→ h1
, r1
traversal
step | netfilter hook / xfrm | encapsulation | iif | oif | ip saddr | ip daddr |
---|---|---|---|---|---|---|
1 | prerouting | |eth|ip|esp|ip|icmp| | eth1 | 5.0.0.1 | 3.0.0.1 |
|
2 | input | |eth|ip|esp|ip|icmp| | eth1 | 5.0.0.1 | 3.0.0.1 |
|
3 | xfrm/socket lookup | |eth|ip|esp|ip|icmp| | ||||
4 | xfrm decode | |eth|......|ip|icmp| | ||||
5 | prerouting | |eth| |ip|icmp| | eth1 | 10.0.2.100 | 10.0.1.100 |
|
6 | forward | |eth| |ip|icmp| | eth1 | eth0 | 10.0.2.100 | 10.0.1.100 |
7 | postrouting | |eth| |ip|icmp| | eth0 | 10.0.2.100 | 10.0.1.100 |
Context
The described behavior and implementation has been observed on a
Debian 10 (buster) system with using Debian backports on amd64
architecture.
- kernel:
5.4.19-1~bpo10+1
- nftables:
0.9.3-2~bpo10+1
- libnftnl:
1.1.5-1~bpo10+1
- strongswan:
5.7.2-1
Feedback
Feedback to this article is very welcome!
AH
protocol, which can be used as alternative to ESP
, but this protocol only provides authentication and data integrity and no encryption. Thus it is not very relevant in practice.eth
(ethernet) and the ip
header. I just show it like this here to emphasize WHERE in the packet the esp
header and the outer ip
header are being inserted.nhrpd
of the FRR routing protocol engine, which is used in DMVPN setups.