Thermalcircle.de

climbing the thermals

User Tools

Site Tools


blog:linux:nftables_ipsec_packet_flow

This is an old revision of the document!


Nftables - Netfilter and VPN/IPsec packet flow

In this article I like to explain how the packet flow through Netfilter hooks looks like on a host which works as a VPN gateway based on IPsec (Strongswan, tunnel-mode, IKEv2, ESP). I'll focus on Nftables in favor of the older Iptables and regarding Strongswan I'll focus on the newer vici interface (using swanctl) in favor of the older stroke interface.

See also my other article which covers packet flow through Netfilter hooks in general.

IPsec short recap

A comprehensive recap on the topic IPsec would require a whole book. I'll merely provide a very short recap here, based on protocols and ports, to put my actual topic into context.

An IPsec based VPN has a “management channel”, which is the IKE (in this case here the IKEv2) protocol. It is responsible for bringing up and tearing down the VPN tunnel connection between two VPN endpoint hosts and negotiation of the Security Associations (SAs) and thereby the security credentials and encryption keys. It is encapsulated in UDP and uses UDP port 500 and in case of NAT-traversal1) it dynamically switches to UDP port 4500 during IKE handshake. Its encapsulation looks like this:

|eth|ip|udp|ikev2| IKEv2 packet on UDP/500 or UDP/4500.

After IKE handshake has been successfully finished, the VPN tunnel between both endpoints thereby is “up” and packets can travel through it. In case of tunnel-mode the IP packets which shall travel through the VPN tunnel are being encrypted and then encapsulated in packets of the so-called ESP protocol2). The whole thing is then further encapsulated into another (an “outer”) IP packet. The reason is that the VPN tunnel itself is merely a point-to-point connection between two VPN endpoints (thus one source and one destination IP address), but those endpoints are in that case VPN gateways which are used to connect entire subnets on both ends of the tunnel. Thus the source and destination IP addresses of the “payload” packets which travel through the VPN gateway need to kept independent from the source and destination IP addresses of the “outer” IP packet. Encapsulation then looks like this (example for a TCP connection):

|eth|      |ip|tcp|payload|3) A “normal” packet which shall travel through the VPN tunnel, is encrypted and encapsulated like this while traversing the VPN gateway.
|eth|ip|esp|ip|tcp|payload|4)

If Nat-traversal is active, then ESP is additionally encapsulated in UDP:

|eth|          |ip|tcp|payload| In case of Nat-traversal additional encapsulation in UDP (same port 4500 as for IKE is then used here).
|eth|ip|udp|esp|ip|tcp|payload|

IPsec Linux implementation

IPsec implementation in Linux consists of a userspace part and a kernel part. Nowadays the userspace part is represented by the StrongSwan5) suite and the kernel part is represented by the Xfrm framework, which is sometimes called the Netkey stack and is present in the kernel since v2.6. With the following image I like to show these components and how they interact in a simple block diagram style.

Stongswan

The essential part of Strongswan is the userspace daemon charon which implements IKEv1/IKEv2 and acts as the central “orchestrator” of IPsec-based VPN (the main active component) on the system.

It provides an interface to the user/administrator for configuration of IPsec on the system. Actually, more precisely, it provides two different interfaces to do that: One is the so-called Stroke interface. It provides means to configure IPsec via two main config files /etc/ipsec.conf and /etc/ipsec.secrets. This is the older of the two interfaces and it can be considered deprecated (however it is still supported). The other and newer one is the so-called Vici interface. It is an IPC mechanism, which means the charon daemon listens on a Unix-domain socket and client tools like Strongswans own cmdline tool swanctl6) can connect to it to configure IPsec. This way of configuration is more powerful than the Stroke interface , because it makes it easier for other tools to provide and adjust configuration dynamically and event driven at any time.

However in many common IPsec setups the configuration is still simply being supplied via a config files. When using Vici, the difference is merely, that the config file(s) (mainly the file /etc/swanctl/swanctl.conf) are not interpreted by the charon daemon directly, but instead are interpreted by the cmdline tool swanctl which then feeds this config into the charon daemon via the Vici IPC interface.

The charon daemon uses a Netlink socket as a communication channel into the kernel.

The xfrm framework

The so-called xfrm framework is a component within the Linux kernel. While the userspace part (Strongswan) handles the overall IPsec orchestration and runs the IKEv1/IKEv2 protocol to buildup/teardown VPNs, the kernel part handles all what can be considered the “VPN payload”. It implements the Security Association Database (SAD) and the Security Policy Database (SPD)

This means the userspace daemon charon feeds the actual IPsec Security Association (SA) instances and Security Policy (SP) instances, which result from configuration and from IKEv1/IKEv2 handshake into the kernel and the kernel maintains and uses those to encrypt and decrypt the actual “payload” network packets of the VPN.

You can use the iproute2 tool ip as low-level admin tool to show the SAs and SPs which are currently configured in the databases inside the kernel:

#list SAs which are currently configured in the kernel
ip xfrm state
 
#list SPs which are currently configured in the kernel
ip xfrm policy

The ip tool uses the same means (Netlink socket) to communicate with the kernel. You could also use it as a low-level config tool to create/edit/delete SAs and SPs in the kernel, however in practice you leave those duties to Strongswan.

hooks

Table

ICMP echo-request h1h2, r1 traversal

step netfilter hook / xfrm encapsulation iif oif ip saddr ip daddr
1 prerouting |eth|      |ip|icmp| eth0 10.0.1.100 10.0.2.100
2 forward |eth|      |ip|icmp| eth0 eth1 10.0.1.100 10.0.2.100
3 postrouting |eth|      |ip|icmp| eth1 10.0.1.100 10.0.2.100
4 xfrm lookup |eth|      |ip|icmp|
5 xfrm encode |eth|......|ip|icmp|
6 output |eth|ip|esp|ip|icmp| eth1 3.0.0.1 5.0.0.1
7 postrouting |eth|ip|esp|ip|icmp| eth1 3.0.0.1 5.0.0.1

ICMP echo-reply h2h1, r1 traversal

step netfilter hook / xfrm encapsulation iif oif ip saddr ip daddr
1 prerouting |eth|ip|esp|ip|icmp| eth1 5.0.0.1 3.0.0.1
2 input |eth|ip|esp|ip|icmp| eth1 5.0.0.1 3.0.0.1
3 xfrm/socket lookup |eth|ip|esp|ip|icmp|
4 xfrm decode |eth|......|ip|icmp|
5 prerouting |eth|      |ip|icmp| eth1 10.0.2.100 10.0.1.100
6 forward |eth|      |ip|icmp| eth1 eth0 10.0.2.100 10.0.1.100
7 postrouting |eth|      |ip|icmp| eth0 10.0.2.100 10.0.1.100

Context

The described behavior and implementation has been observed on a Debian 10 (buster) system with using Debian backports on amd64 architecture.

  • kernel: 5.4.19-1~bpo10+1
  • nftables: 0.9.3-2~bpo10+1
  • libnftnl: 1.1.5-1~bpo10+1
  • strongswan: 5.7.2-1

Feedback

Feedback to this article is very welcome!

1)
=if a NAT router is detected between both endpoints during IKE handshake
2)
Yes, there is also the AH protocol, which can be used as alternative to ESP, but this protocol only provides authentication and data integrity and no encryption. Thus it is not very relevant in practice.
3)
Of course, in the actual packet there is no “blank space” between the eth (ethernet) and the ip header. I just show it like this here to emphasize WHERE in the packet the esp header and the outer ip header are being inserted.
4)
The darker grey background here shall show that this is the part of the whole packet which gets encrypted.
5)
There have been predecessors like Openswan and FreeS/WAN.
6)
But also other tools like e.g. the NHRP daemon nhrpd of the FRR routing protocol engine, which is used in DMVPN setups.
blog/linux/nftables_ipsec_packet_flow.1591017604.txt.gz · Last modified: 2020-06-01 by Andrej Stender