blog:linux:nftables_ipsec_packet_flow
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
blog:linux:nftables_ipsec_packet_flow [2020-06-01] – [Table] Andrej Stender | blog:linux:nftables_ipsec_packet_flow [2022-08-14] (current) – added details about xfrm bundle Andrej Stender | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | {{tag> | ||
====== Nftables - Netfilter and VPN/IPsec packet flow ====== | ====== Nftables - Netfilter and VPN/IPsec packet flow ====== | ||
~~META: | ~~META: | ||
Line 5: | Line 6: | ||
In this article I like to explain how the packet flow through | In this article I like to explain how the packet flow through | ||
- | //Netfilter// hooks looks like on a host which works as a VPN gateway | + | Netfilter hooks looks like on a host which works as an IPsec-based |
- | based on IPsec (Strongswan, tunnel-mode, | + | Obviously network packets which are to be sent through a VPN tunnel are encrypted+encapsulated |
- | in favor of the older //Iptables// and regarding Strongswan | + | this exactly happen and which packet traverses which Netfilter hook in which sequence and in which form (encrypted/ |
- | //vici// interface | + | I'll do a short recap of IPsec in general, explain the IPsec implementation on Linux as it is commonly used today (Strongswan |
- | See also [[nftables_packet_flow_netfilter_hooks_detail|my other article]] which covers packet flow through // | ||
+ | ===== IPsec short recap ===== | ||
+ | A comprehensive recap on the topic IPsec would require an entire book. I'll merely | ||
+ | provide a very short recap here focused on protocols and ports to put my actual topic into context. | ||
- | ===== IPsec recap ===== | + | ==== IKE protocol |
- | A comprehensive recap on the topic IPsec would require | + | An IPsec based VPN possesses |
- | provide | + | (usually session-based keys). IKE is encapsulated in UDP and uses UDP port 500. |
+ | In case of // | ||
- | ===== IPsec implementation in Linux ===== | + | | '' |
- | IPsec implementation in Linux consists of a userspace part and a kernel part. | + | |
- | Nowadays the userspace part is represented by the [[wp> | + | |
- | and the kernel part is represented by the //Xfrm framework//, which is sometimes called the //Netkey stack// and is present in the kernel since v2.6. With the following image I like to show these components and how they interact in a simple block diagram style. | + | |
- | {{:wiki: | + | |
+ | |||
+ | ==== SAs and SPs ==== | ||
+ | The mentioned algorithms and keys which are negotiated during IKE handshake are being organized in | ||
+ | so-called //Security Associations// | ||
+ | |||
+ | In addition to the SAs, IPsec also introduces the concept of the so-called //Security Policies// (SPs), which are also created during IKE handshake. Those are either defined by the IPsec tunnel configuration provided by the admin/user and/or (depending on case) can also at least partly result from dynamic IKE negotiation. The purpose of the SPs is to act as " | ||
+ | |||
+ | Be aware that both SAs and SPs merely are volatile and not persistent data. Their lifetime is defined by the lifetime of the VPN tunnel connection. It might even be shorter because of key re-negotiations / " | ||
+ | ==== ESP protocol, tunnel-mode ==== | ||
+ | After the initial IKE handshake has been successfully finished, the VPN tunnel between both endpoints thereby is " | ||
+ | packets can travel through it. In case of // | ||
+ | |||
+ | | < | ||
+ | | < | ||
+ | |||
+ | If // | ||
+ | |||
+ | | < | ||
+ | | < | ||
+ | |||
+ | |||
+ | ===== IPsec Linux implementation ===== | ||
+ | IPsec implementation in Linux consists of a userspace part and a kernel part. | ||
+ | Several implementations have been created over the years. Nowadays the most commonly used implementation of the userspace part seems to be [[wp> | ||
+ | |||
+ | <figure linuxipsecimpl1> | ||
+ | {{: | ||
+ | < | ||
+ | Block diagram showing userspace and kernel part of the IPsec implementation | ||
+ | on Linux (StrongSwan and Xfrm framework) and interfaces between both | ||
+ | </ | ||
+ | </ | ||
==== Stongswan ==== | ==== Stongswan ==== | ||
The essential part of Strongswan is the userspace daemon //charon// which implements | The essential part of Strongswan is the userspace daemon //charon// which implements | ||
- | IKEv1/IKEv2 and acts as the central " | + | IKEv1/IKEv2 and acts as the central " |
- | on the system. | + | on each VPN endpoint host. |
- | It provides an interface to the user/administrator to configure | + | It provides an interface to the user/admin for configuration of IPsec on the system. |
Actually, more precisely, it provides two different interfaces to do that: | Actually, more precisely, it provides two different interfaces to do that: | ||
One is the so-called //Stroke// interface. It provides means to configure IPsec | One is the so-called //Stroke// interface. It provides means to configure IPsec | ||
via two main config files ''/ | via two main config files ''/ | ||
- | This is the older of the two interfaces and it can be considered deprecated (however | + | This is the older of the two interfaces and it can be considered deprecated |
- | it is still supported). | + | by now (however it is still supported). |
- | The other and newer one is the so-called //Vici// interface. | + | |
- | which means the //charon// daemon | + | The other and newer one is the so-called //Vici// interface. |
- | (like Strongswans own cmdline tool '' | + | which means the //charon// daemon |
- | NHRP daemon of the FRR routing protocol engine, which is used in DMVPN setups) | + | like Strongswans own cmdline tool '' |
+ | //NHRP// daemon | ||
can connect to it to configure IPsec. | can connect to it to configure IPsec. | ||
This way of configuration is more powerful than the //Stroke// interface , because it | This way of configuration is more powerful than the //Stroke// interface , because it | ||
makes it easier for other tools to provide and adjust configuration dynamically | makes it easier for other tools to provide and adjust configuration dynamically | ||
and event driven at any time. | and event driven at any time. | ||
- | |||
However in many common IPsec setups the configuration is still simply being | However in many common IPsec setups the configuration is still simply being | ||
- | supplied via a config files. When using //Vici//, the difference is merely, | + | supplied via config files. When using //Vici//, the difference |
the config file(s) (mainly the file ''/ | the config file(s) (mainly the file ''/ | ||
by the //charon// daemon directly, but instead are interpreted by the cmdline tool '' | by the //charon// daemon directly, but instead are interpreted by the cmdline tool '' | ||
which then feeds this config into the //charon// daemon via the //Vici// IPC interface. | which then feeds this config into the //charon// daemon via the //Vici// IPC interface. | ||
+ | Further, the syntax of '' | ||
+ | of '' | ||
- | The //charon// daemon uses a //Netlink// socket as a communication channel | + | An additional config file '' |
- | into the kernel. | + | |
- | ==== The xfrm framework ==== | + | So let's say you created a '' |
- | The so-called | + | |
- | the userspace part (Strongswan) handles | + | |
- | runs the IKEv1/IKEv2 protocol to buildup/teardown VPNs, the kernel part | + | |
- | handles | + | |
- | //Security Association Database// ('' | + | |
- | Database// ('' | + | |
- | This means the userspace | + | ==== The Xfrm framework ==== |
- | daemon charon feeds the actual IPsec //Security Association// ('' | + | The so-called //Xfrm framework// is a component within |
- | instances and //Security Policy// ('' | + | As one of the //iproute2// man pages(('' |
- | configuration | + | it is an //"IP framework for transforming packets |
- | maintains | + | (such as encrypting their payloads)"//. Thus, " |
- | packets of the VPN. | + | While the userspace part (Strongswan) handles the overall IPsec orchestration |
+ | runs the IKEv1/ | ||
+ | the kernel | ||
+ | network packets which travel through the VPN tunnel | ||
+ | go through | ||
+ | instances which define | ||
+ | Only then it can make decisions on which packet shall be encrypted/ | ||
+ | and which not and which encryption algorithms and keys to use. | ||
- | You can use the // | + | The Xfrm framework implements the so-called //Security Association Database// (SAD) |
- | the '' | + | and the //Security Policy Database// (SPD) for holding SA and SP instances. |
- | databases | + | An SA is represented by '' |
- | #list SAs which are currently configured in the kernel | + | Userspace components (like // |
- | ip xfrm state | + | |
- | #list SPs which are currently | + | * command '' |
- | ip xfrm policy | + | * command '' |
+ | |||
+ | You can further use '' | ||
+ | |||
+ | SP instances can be created for three different "data directions": | ||
+ | ^ security policy | ||
+ | | " | ||
+ | | "input policy" | ||
+ | | " | ||
+ | |||
+ | If you are working with Nftables or Iptables, then you probably are | ||
+ | familiar with the widely used | ||
+ | [[: | ||
+ | which illustrates the packet flow through the Netfilter hooks and | ||
+ | Iptables //chains//. One great thing about this image is, that it also covers | ||
+ | the Xfrm framework. It illustrates four distinct "Xfrm actions" | ||
+ | in the network packet flow path, named // | ||
+ | //xfrm lookup// and //xfrm encode//. However, this illustration is | ||
+ | kind-of a bird's eye view. These four " | ||
+ | actual Xfrm implementation very closely. The actual framework | ||
+ | works a little bit different, which means that there actually are | ||
+ | more than four points within the packet flow path where Xfrm takes | ||
+ | action and also the locations of those are a little bit different. | ||
+ | Figure {{ref> | ||
+ | with main focus on Netfilter and the Xfrm framework. | ||
+ | It shows the Netfilter hooks in blue color and the locations where | ||
+ | the Xfrm framework takes action in magenta color. | ||
+ | If you are not yet familiar with the Netfilter hooks and their relation | ||
+ | to Nftables/ | ||
+ | Netfilter hooks in detail]] before proceeding here. | ||
+ | |||
+ | <figure nfhooksxfrm1> | ||
+ | {{: | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | | < | ||
+ | It is an instance of two combined structs, the outer '' | ||
+ | | < | ||
+ | </ | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | they must match the "input policy" | ||
+ | " | ||
+ | ESP packets anyway circumvent this action, as you can see in Figure {{ref> | ||
+ | | < | ||
+ | they must match the " | ||
+ | |||
+ | |||
+ | The Xfrm framework implementation does NOT use virtual network interfaces to distinguish between VPN and non-VPN traffic. This is a relevant difference compared to other implementations like the older //KLIPS// IPsec stack which was used in kernel v2.4 and earlier. Why is this relevant? It is true that virtual network interfaces are not required, because the concept of the SPs does all the distinction | ||
+ | |||
+ | It is obvious that an Nftables rule would be easy to write if all VPN traffic goes through a virtual network interface e.g. called '' | ||
+ | by default. Additional features have been developed over the years to address this problem. Some of them | ||
+ | re-introduce the concept of virtual network interfaces "on top" of the Xfrm framework, but those | ||
+ | are optional to use and never became the default. The Strongswan documentation calls VPN setups based on those virtual network interfaces [[https:// | ||
+ | |||
+ | <figure xfrm_dst> | ||
+ | {{: | ||
+ | < | ||
+ | (click to enlarge). In IPsec tunnel-mode, | ||
+ | references to IPsec SA and SP and function pointers to lead the packet | ||
+ | on the Xfrm encrypt+encapsulate path. Compare it to a normal | ||
+ | //routing decision// object, which I described in my | ||
+ | [[routing_decisions_in_the_linux_kernel_1_lookup_packet_flow# | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | ===== Example Site-to-site VPN ===== | ||
+ | <figure ipsecsstopo1> | ||
+ | {{ : | ||
+ | < | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | It is better to have a practical example as basis for further diving into the topic. Here I will use a site-to-site VPN setup, which is created between two VPN gateways '' | ||
+ | It can be roughly compared to the [[https:// | ||
+ | The VPN tunnel will connect the local subnets behind '' | ||
+ | |||
+ | * [[: | ||
+ | |||
+ | <figure swanctl_conf_r1_r2> | ||
+ | <WRAP group>< | ||
+ | <code json r1: swanctl.conf> | ||
+ | connections { | ||
+ | gw-gw { | ||
+ | local_addrs | ||
+ | remote_addrs = 9.0.0.1 | ||
+ | local { | ||
+ | auth = psk | ||
+ | id = r1 | ||
+ | } | ||
+ | remote { | ||
+ | auth = psk | ||
+ | id = r2 | ||
+ | } | ||
+ | children { | ||
+ | net-net { | ||
+ | mode = tunnel | ||
+ | local_ts | ||
+ | remote_ts = 192.168.2.0/ | ||
+ | esp_proposals = aes128gcm128 | ||
+ | } | ||
+ | } | ||
+ | version = 2 | ||
+ | mobike = no | ||
+ | reauth_time = 10800 | ||
+ | proposals = aes128-sha256-modp3072 | ||
+ | } | ||
+ | } | ||
+ | secrets { | ||
+ | ike-1 { | ||
+ | id-1 = r1 | ||
+ | id-2 = r2 | ||
+ | secret = " | ||
+ | } | ||
+ | }</ | ||
+ | <code json r2: swanctl.conf> | ||
+ | connections { | ||
+ | gw-gw { | ||
+ | local_addrs | ||
+ | remote_addrs = 8.0.0.1 | ||
+ | local { | ||
+ | auth = psk | ||
+ | id = r2 | ||
+ | } | ||
+ | remote { | ||
+ | auth = psk | ||
+ | id = r1 | ||
+ | } | ||
+ | children { | ||
+ | net-net { | ||
+ | mode = tunnel | ||
+ | local_ts | ||
+ | remote_ts = 192.168.1.0/ | ||
+ | esp_proposals = aes128gcm128 | ||
+ | } | ||
+ | } | ||
+ | version = 2 | ||
+ | mobike = no | ||
+ | reauth_time = 10800 | ||
+ | proposals = aes128-sha256-modp3072 | ||
+ | } | ||
+ | } | ||
+ | secrets { | ||
+ | ike-1 { | ||
+ | id-1 = r1 | ||
+ | id-2 = r2 | ||
+ | secret = " | ||
+ | } | ||
+ | }</ | ||
+ | < | ||
+ | Strongswan configuration on '' | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | Execute command '' | ||
+ | databases inside | ||
+ | |||
+ | <figure r1_ip_xfrm_state> | ||
+ | <code bash> | ||
+ | root@r1: | ||
+ | src 8.0.0.1 dst 9.0.0.1 | ||
+ | proto esp spi 0xc5400599 reqid 1 mode tunnel | ||
+ | replay-window 0 flag af-unspec | ||
+ | aead rfc4106(gcm(aes)) 0x8849c107d9f6972da27a5faef554a68b10f3b938 128 | ||
+ | anti-replay context: seq 0x0, oseq 0x9, bitmap 0x00000000 | ||
+ | src 9.0.0.1 dst 8.0.0.1 | ||
+ | proto esp spi 0xcd7dff80 reqid 1 mode tunnel | ||
+ | replay-window 32 flag af-unspec | ||
+ | aead rfc4106(gcm(aes)) 0x3c0497d489904175bdb446f3e09ae4c3acaf5d45 128 | ||
+ | anti-replay context: seq 0x9, oseq 0x0, bitmap 0x000001ff | ||
</ | </ | ||
+ | < | ||
+ | </ | ||
- | The '' | + | <figure r1_ip_xfrm_policy> |
- | the kernel. You could also use it as a low-level config tool to | + | <code bash> |
- | create/edit/delete | + | root@r1: |
- | you leave those duties to Strongswan. | + | src 192.168.1.0/24 dst 192.168.2.0/24 |
+ | dir out priority 375423 ptype main | ||
+ | tmpl src 8.0.0.1 dst 9.0.0.1 | ||
+ | proto esp spi 0xc5400599 reqid 1 mode tunnel | ||
+ | src 192.168.2.0/24 dst 192.168.1.0/24 | ||
+ | dir fwd priority 375423 ptype main | ||
+ | tmpl src 9.0.0.1 dst 8.0.0.1 | ||
+ | proto esp reqid 1 mode tunnel | ||
+ | src 192.168.2.0/ | ||
+ | dir in priority 375423 ptype main | ||
+ | tmpl src 9.0.0.1 dst 8.0.0.1 | ||
+ | proto esp reqid 1 mode tunnel | ||
+ | src 0.0.0.0/0 dst 0.0.0.0/0 | ||
+ | | ||
+ | src 0.0.0.0/0 dst 0.0.0.0/ | ||
+ | | ||
+ | src 0.0.0.0/0 dst 0.0.0.0/0 | ||
+ | socket in priority 0 ptype main | ||
+ | src 0.0.0.0/0 dst 0.0.0.0/0 | ||
+ | socket out priority 0 ptype main | ||
+ | src ::/0 dst ::/0 | ||
+ | socket in priority 0 ptype main | ||
+ | src ::/0 dst ::/0 | ||
+ | socket out priority 0 ptype main | ||
+ | src ::/0 dst ::/0 | ||
+ | socket in priority 0 ptype main | ||
+ | src ::/0 dst ::/0 | ||
+ | socket out priority 0 ptype main | ||
+ | </ | ||
+ | < | ||
+ | </ | ||
- | ==== hooks ==== | + | Just for completeness: To tear the the VPN tunnel down again, you would need to execute command |
- | {{:wiki: | + | '' |
+ | ===== Packet Flow ===== | ||
+ | Let's say the VPN tunnel in the example described above is now up and running. To start simple, let's also assume that we have not yet configured any Nftables //ruleset// on the VPN gateways '' | ||
- | ===== Table ===== | + | This means, packets which are traversing one of the VPN gateways '' |
- | **'' | + | <figure r1_icmp> |
+ | {{: | ||
+ | \\ | ||
+ | <code bash> | ||
+ | h1$ ping -c1 192.168.2.100 | ||
+ | # | ||
+ | # 192.168.1.100 -> ICMP echo-request | ||
+ | # 192.168.1.100 <- ICMP echo-reply | ||
+ | </ | ||
+ | < | ||
+ | </ | ||
- | {{: | + | The following Figures |
+ | {{ref> | ||
+ | and the corresponding ICMP //echo-reply// traverse the kernel network stack on | ||
+ | '' | ||
+ | result of me doing a lot of experimenting and reading in the kernel source code. | ||
+ | I used the Nftables '' | ||
+ | chain traversal, and thereby Netfilter hook traversal, visible. Further, I | ||
+ | used '' | ||
+ | takes through the kernel network stack while being encrypted/ | ||
+ | On some occasions I used '' | ||
+ | breakpoints within the kernel of a linux virtual machine and thereby | ||
+ | observe the content of data structures involved with a traversing packet. | ||
+ | While reading source code, the book [[https:// | ||
+ | help to me to find orientation within the kernel network stack. I hope | ||
+ | this gives you a head start in case you intend to dive deep into that topic | ||
+ | yourself, too. | ||
- | ^ step ^ netfilter hook / xfrm ^ encapsulation | + | <figure echo_request_r1_traversal> |
- | | 1 | '' | + | {{:linux:packet-flow-ipsec-tunnel-encrypt.png?direct&700|}} |
- | | 2 | '' | + | <caption> ICMP echo-request |
- | | 3 | '' | + | </figure> |
- | | 4 | '' | + | |
- | | 5 | '' | + | |
- | | 6 | '' | + | |
- | | 7 | '' | + | |
- | **'' | + | ^ Step ^^ Encapsulation |
+ | | 1 | **eth0** | ||
+ | | 2 | < | ||
+ | | 3 | < | ||
+ | | 4 | < | ||
+ | | 5 | < | ||
+ | | 6 | **Routing** | ||
+ | | 7 | < | ||
+ | | 8 | < | ||
+ | | 9 | < | ||
+ | | 10 | < | ||
+ | | 11 | < | ||
+ | | 12 | < | ||
+ | | 13 | < | ||
+ | | 14 | < | ||
+ | | 15 | < | ||
+ | | 16 | < | ||
+ | | 17 | **eth1** | < | ||
- | {{: | ||
- | ^ step ^ netfilter | + | __Steps (1) (2) (3) (4) (5):__ The //ICMP echo-request// |
- | | 1 | '' | + | |
- | | 2 | + | __Step (6):__ The routing lookup is performed. It determines that this packet |
- | | 3 | + | needs to be forwarded and sent out on '' |
- | | 4 | + | decision to the packet. |
- | | 5 | + | |
- | | 6 | + | __Step (7):__ The Xfrm framework performs a lookup into the IPsec SPD, |
- | | 7 | + | searching for a matching //forward policy// ('' |
+ | found and the packet passes. | ||
+ | |||
+ | __Step (8):__ The Xfrm framework performs a lookup into the IPsec SPD, | ||
+ | searching for a matching //output policy// ('' | ||
+ | source and destination IP addresses '' | ||
+ | matches, see Figure {{ref> | ||
+ | An IPsec SA is resolved (see Figure {{ref> | ||
+ | the matching SP. The Xfrm framework detects, that tunnel-mode is configured in | ||
+ | this SA. Thus, it now performs yet another routing lookup, this time for the (future) outer | ||
+ | IPv4 packet, which will later encapsulate the current packet. A “bundle” of | ||
+ | transformation instructions for this packet is assembled, which contains the | ||
+ | original routing decision from step (6), the SP, the SA, the routing decision | ||
+ | for the future outer IP packet and more. It is attached to the packet, | ||
+ | replacing the attached routing decision from step (6). | ||
+ | |||
+ | __Steps (9) (10):__ The packet traverses the Netfilter //Forward// and | ||
+ | // | ||
+ | |||
+ | __Step (11):__ The Xfrm framework transforms the packet | ||
+ | according to the instructions in the attached “bundle”. In this case this | ||
+ | means encapsulating the IP packet into a new outer IP packet | ||
+ | with source IP address '' | ||
+ | and then encapsulating the inner IP packet into ESP protocol and encrypting it and its | ||
+ | payload. After that the transformation instructions are removed from the | ||
+ | “bundle”, | ||
+ | attached to the packet. | ||
+ | |||
+ | __Steps (12) (13) (14) (15) (16) (17):__ The packet is re-inserted into the local | ||
+ | output path. It traverses the Netfilter // | ||
+ | Netfilter | ||
+ | subsystem// which in this case resolves the next hop gateway ip address | ||
+ | '' | ||
+ | (by doing ARP lookup, if address not yet in cache). | ||
+ | Finally, it traverses the //egress// queueing discipline (network packet | ||
+ | scheduler, '' | ||
+ | then is sent out on '' | ||
+ | |||
+ | The output interface '' | ||
+ | |||
+ | <figure echo_reply_r1_traversal> | ||
+ | {{: | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | ^ Step ^^ Encapsulation | ||
+ | | 1 | **eth1** | < | ||
+ | | 2 | < | ||
+ | | 3 | < | ||
+ | | 4 | < | ||
+ | | 5 | < | ||
+ | | 6 | **Routing** | ||
+ | | 7 | < | ||
+ | | 8 | < | ||
+ | | 9 | **eth1** | ||
+ | | 10 | < | ||
+ | | 11 | < | ||
+ | | 12 | < | ||
+ | | 13 | < | ||
+ | | 14 | **Routing** | ||
+ | | 15 | < | ||
+ | | 16 | < | ||
+ | | 17 | < | ||
+ | | 18 | < | ||
+ | | 19 | < | ||
+ | | 20 | < | ||
+ | | 21 | < | ||
+ | | 22 | **eth0** | < | ||
+ | |||
+ | |||
+ | __Steps (1) (2) (3) (4) (5):__ The //ICMP echo-reply// | ||
+ | |||
+ | __Steps (6) (7):__ The routing lookup is performed. In this case here, the routing subsystem determines that this packets destination IP '' | ||
+ | |||
+ | __Steps (8) (9):__ The Xfrm framework has a layer 4 receive handler waiting for incoming ESP | ||
+ | packets at this point. It parses the SPI value from the ESP header of the | ||
+ | packet and performs a lookup into the SAD for a matching IPsec SA (lookup | ||
+ | based on SPI and destination IP address). A matching SA is found, see Figure | ||
+ | {{ref> | ||
+ | specifies the further steps to take here for this packet. It is decrypted and | ||
+ | the ESP header is decapsulated. Now the internal IP packet becomes visible. | ||
+ | The SA specifies tunnel-mode, | ||
+ | of packet meta data is changed here, e.g. the attached routing decision (of | ||
+ | the outer IP packet, which is now removed) is stripped away, the reference to | ||
+ | connection tracking is removed, and a pointer to the SA which has been used | ||
+ | here to transform the packet is attached (via skb extension | ||
+ | '' | ||
+ | recognize that this packet has been decrypted by Xfrm. Finally, the packet is | ||
+ | now re-inserted into the OSI layer 2 receive path of '' | ||
+ | |||
+ | __Steps (10) (11) (12) (13):__ Now history repeats... the packet once again traverses | ||
+ | //taps//, the //ingress// queueing discipline and the Netfilter //Ingress// hook of | ||
+ | '' | ||
+ | |||
+ | __Step (14):__ The routing lookup is performed. It determines that this | ||
+ | packet needs to be forwarded and sent out on '' | ||
+ | to the packet. | ||
+ | |||
+ | __Step (15):__ The Xfrm framework recognizes, that this packet has been transformed | ||
+ | according to the SA, whose pointer is still attached to the packet | ||
+ | It checks if a '' | ||
+ | a match is found here, see Figure {{ref> | ||
+ | |||
+ | __Step (16):__ The Xfrm framework performs a lookup into the IPsec SPD, searching for a matching | ||
+ | //output policy// ('' | ||
+ | The idea of the //output policy// is to detect packets which shall be encrypted with IPsec. | ||
+ | Packets which do not match, just pass. | ||
+ | |||
+ | __Steps (17) (18) (19) (20) (21) (22):__ The packet traverses the Netfilter | ||
+ | //Forward// and // | ||
+ | subsystem// which does resolve the destination IP address, which is now | ||
+ | '' | ||
+ | in cache). Finally, it traverses the //egress// queueing discipline, //taps// | ||
+ | and then is sent out on '' | ||
+ | |||
+ | The input interface '' | ||
+ | This is what it means that the Xfrm framework does not use virtual network interfaces. If virtual network interfaces would instead be used here (e.g. a '' | ||
+ | ===== SNAT, Nftables ===== | ||
+ | Now to add the SNAT behavior to '' | ||
+ | Nftables //ruleset// on '' | ||
+ | |||
+ | <code bash> | ||
+ | nft add table nat | ||
+ | nft add chain nat postrouting { type nat hook postrouting priority 100\; } | ||
+ | nft add rule nat postrouting oif eth1 masquerade | ||
+ | nft add table filter | ||
+ | nft add chain filter forward { type filter hook forward priority 0\; policy drop\; } | ||
+ | nft add rule filter forward iif eth0 oif eth1 accept | ||
+ | nft add rule filter forward iif eth1 oif eth0 ct state established, | ||
+ | </ | ||
+ | |||
+ | This //ruleset// is identical on both hosts. | ||
+ | What is the resulting change in behavior? Well, for non-VPN traffic now all works as intended. Hosts '' | ||
+ | and '' | ||
+ | |||
+ | However, let's take another look at the example from Figure {{ref> | ||
+ | Resulting from that, this packet now does not match the IPsec //output policy// anymore. Thus, it won't get encrypted+encapsulated! Obviously that is not our intended behavior, but let's first dig deeper to understand what actually happens here: In step (8) this packet still had its original source and destination IP addresses '' | ||
+ | |||
+ | Ok, now we understand it ... the ping is natted, but then sent out plain and unencrypted. That is not what we want. Further, this ping is now anyway doomed to fail, because '' | ||
+ | |||
+ | How to fix that? It is our intended behavior, that network packets from subnet | ||
+ | '' | ||
+ | VPN tunnel and shall not be natted. Also, it shall be possible to establish connections | ||
+ | with connection oriented protocols (e.g. TCP((Connection tracking in Linux handles a ping (ICMP | ||
+ | echo-request + echo-reply) as a connection oriented protocol, thus the same applies to ping here.))) | ||
+ | in both ways through the VPN tunnel. One simple way to achieve this behavior is to add these two rules on '' | ||
+ | <code bash r1> | ||
+ | nft insert rule nat postrouting oif eth1 ip daddr 192.168.2.0/ | ||
+ | nft add rule filter forward iif eth1 oif eth0 ip saddr 192.168.2.0/ | ||
+ | </ | ||
+ | For the first rule I used '' | ||
+ | rule of the // | ||
+ | need to do the equivalent (but not identical!) thing on '' | ||
+ | <code bash r2> | ||
+ | nft insert rule nat postrouting oif eth1 ip daddr 192.168.1.0/ | ||
+ | nft add rule filter forward iif eth1 oif eth0 ip saddr 192.168.1.0/ | ||
+ | </ | ||
+ | |||
+ | The complete rulesets then look like this: [[: | ||
+ | |||
+ | Let's look at the example from Figure {{ref> | ||
+ | |||
+ | When the ICMP echo-request packet is received by and traverses '' | ||
+ | |||
+ | ===== Distinguish VPN/non-VPN traffic ===== | ||
+ | No matter if your idea is to do NAT or your intentions are other kinds of packet manipulation or packet filtering, it all boils down to distinguishing between VPN and non-VPN traffic. | ||
+ | The Nftables rulesets I applied to '' | ||
+ | As I mentioned above, things would be easier, if the Xfrm framework would use virtual network interfaces, because those then could serve as basis for making this distinction. | ||
+ | |||
+ | Several means have been implemented to address those kind of issues: | ||
+ | * Strongswan provides an optional '' | ||
+ | * Nftables offers IPSEC EXPRESSIONS (syntax '' | ||
+ | * Nftables offers " | ||
+ | * So-called '' | ||
Line 119: | Line 569: | ||
===== Context ===== | ===== Context ===== | ||
The described behavior and implementation has been observed on a | The described behavior and implementation has been observed on a | ||
- | Debian 10 (buster) system with using Debian // | + | Debian 10 (buster) system with using Debian // |
- | * kernel: '' | + | * kernel: '' |
- | * nftables: '' | + | * nftables: '' |
- | * libnftnl: '' | + | * strongswan: '' |
- | * strongswan: '' | + | |
===== Feedback ===== | ===== Feedback ===== | ||
- | [[: | + | [[: |
+ | |||
+ | ===== References ===== | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | //published 2020-05-30//, | ||
- | {{tag> |
blog/linux/nftables_ipsec_packet_flow.1591013678.txt.gz · Last modified: 2020-06-01 by Andrej Stender