blog:linux:nftables_ipsec_packet_flow
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
blog:linux:nftables_ipsec_packet_flow [2020-06-24] – [IKE protocol] Andrej Stender | blog:linux:nftables_ipsec_packet_flow [2022-08-14] (current) – added details about xfrm bundle Andrej Stender | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | {{tag> | ||
====== Nftables - Netfilter and VPN/IPsec packet flow ====== | ====== Nftables - Netfilter and VPN/IPsec packet flow ====== | ||
~~META: | ~~META: | ||
Line 8: | Line 9: | ||
Obviously network packets which are to be sent through a VPN tunnel are encrypted+encapsulated on a VPN gateway and packets received through the tunnel are decapsulated and decrypted... but in which sequence does | Obviously network packets which are to be sent through a VPN tunnel are encrypted+encapsulated on a VPN gateway and packets received through the tunnel are decapsulated and decrypted... but in which sequence does | ||
this exactly happen and which packet traverses which Netfilter hook in which sequence and in which form (encrypted/ | this exactly happen and which packet traverses which Netfilter hook in which sequence and in which form (encrypted/ | ||
- | + | I'll do a short recap of IPsec in general, explain the IPsec implementation on Linux as it is commonly used today (Strongswan + Xfrm framework) and explain packet traversal through the VPN gateways in an example site-to-site VPN setup (IPsec in tunnel-mode, | |
- | I'll do a short recap of IPsec | + | |
- | in general, explain the IPsec implementation on Linux as it is commonly used today (Strongswan + Xfrm framework) and explain packet traversal through the VPN gateways in an example site-to-site VPN setup (IPsec in tunnel-mode, | + | |
===== IPsec short recap ===== | ===== IPsec short recap ===== | ||
A comprehensive recap on the topic IPsec would require an entire book. I'll merely | A comprehensive recap on the topic IPsec would require an entire book. I'll merely | ||
- | provide a very short recap here focused on protocols and ports, to put my actual topic into context. | + | provide a very short recap here focused on protocols and ports to put my actual topic into context. |
==== IKE protocol ==== | ==== IKE protocol ==== | ||
- | An IPsec based VPN possesses a " | + | An IPsec based VPN possesses a " |
(usually session-based keys). IKE is encapsulated in UDP and uses UDP port 500. | (usually session-based keys). IKE is encapsulated in UDP and uses UDP port 500. | ||
- | In case of // | + | In case of // |
| '' | | '' | ||
Line 30: | Line 29: | ||
so-called //Security Associations// | so-called //Security Associations// | ||
- | Additionally | + | In addition |
Be aware that both SAs and SPs merely are volatile and not persistent data. Their lifetime is defined by the lifetime of the VPN tunnel connection. It might even be shorter because of key re-negotiations / " | Be aware that both SAs and SPs merely are volatile and not persistent data. Their lifetime is defined by the lifetime of the VPN tunnel connection. It might even be shorter because of key re-negotiations / " | ||
==== ESP protocol, tunnel-mode ==== | ==== ESP protocol, tunnel-mode ==== | ||
After the initial IKE handshake has been successfully finished, the VPN tunnel between both endpoints thereby is " | After the initial IKE handshake has been successfully finished, the VPN tunnel between both endpoints thereby is " | ||
- | packets can travel through it. In case of // | + | packets can travel through it. In case of // |
| < | | < | ||
- | | < | + | | < |
If // | If // | ||
| < | | < | ||
- | | < | + | | < |
===== IPsec Linux implementation ===== | ===== IPsec Linux implementation ===== | ||
IPsec implementation in Linux consists of a userspace part and a kernel part. | IPsec implementation in Linux consists of a userspace part and a kernel part. | ||
- | Several implementations have been created over the years. Nowadays the most commonly used implementation of the userspace part seems to be [[wp> | + | Several implementations have been created over the years. Nowadays the most commonly used implementation of the userspace part seems to be [[wp> |
- | + | ||
- | With the following image I like to show the responsibilities of Strongswan and the Xfrm framework and how both interact with each other in a simple block diagram style: | + | |
+ | <figure linuxipsecimpl1> | ||
{{: | {{: | ||
+ | < | ||
+ | Block diagram showing userspace and kernel part of the IPsec implementation | ||
+ | on Linux (StrongSwan and Xfrm framework) and interfaces between both | ||
+ | </ | ||
+ | </ | ||
==== Stongswan ==== | ==== Stongswan ==== | ||
The essential part of Strongswan is the userspace daemon //charon// which implements | The essential part of Strongswan is the userspace daemon //charon// which implements | ||
Line 66: | Line 69: | ||
The other and newer one is the so-called //Vici// interface. This is an IPC mechanism, | The other and newer one is the so-called //Vici// interface. This is an IPC mechanism, | ||
- | which means the //charon// daemon | + | which means the //charon// daemon |
like Strongswans own cmdline tool '' | like Strongswans own cmdline tool '' | ||
//NHRP// daemon '' | //NHRP// daemon '' | ||
Line 79: | Line 82: | ||
which then feeds this config into the //charon// daemon via the //Vici// IPC interface. | which then feeds this config into the //charon// daemon via the //Vici// IPC interface. | ||
Further, the syntax of '' | Further, the syntax of '' | ||
- | of '' | + | of '' |
An additional config file ''/ | An additional config file ''/ | ||
Line 87: | Line 90: | ||
==== The Xfrm framework ==== | ==== The Xfrm framework ==== | ||
The so-called //Xfrm framework// is a component within the Linux kernel. | The so-called //Xfrm framework// is a component within the Linux kernel. | ||
- | As the man page '' | + | As one of the // |
- | (such as encrypting their payloads)"// | + | it is an //"IP framework for transforming packets |
+ | (such as encrypting their payloads)"// | ||
While the userspace part (Strongswan) handles the overall IPsec orchestration and | While the userspace part (Strongswan) handles the overall IPsec orchestration and | ||
runs the IKEv1/IKEv2 protocol to buildup/ | runs the IKEv1/IKEv2 protocol to buildup/ | ||
the kernel part is responsible for encrypting+encapsulating and decrypting+decapsulating | the kernel part is responsible for encrypting+encapsulating and decrypting+decapsulating | ||
network packets which travel through the VPN tunnel and to select/ | network packets which travel through the VPN tunnel and to select/ | ||
- | go through the VPN tunnel at all. To do that, it requires | + | go through the VPN tunnel at all. To do that, it requires all SA and SP |
- | all SA and SP instances which define the VPN tunnel/ | + | instances which define the VPN tunnel/ |
- | be present | + | Only then it can make decisions on which packet shall be encrypted/ |
- | make decisions on which packet shall be encrypted/ | + | and which not and which encryption algorithms and keys to use. |
- | and which encryption algorithms and keys to use. | + | |
The Xfrm framework implements the so-called //Security Association Database// (SAD) | The Xfrm framework implements the so-called //Security Association Database// (SAD) | ||
- | and the //Security Policy Database// (SPD) for holding SA and SP instances in the kernel. | + | and the //Security Policy Database// (SPD) for holding SA and SP instances. |
- | Userspace components (like // | + | An SA is represented by '' |
+ | Userspace components (like // | ||
* command '' | * command '' | ||
* command '' | * command '' | ||
- | You can even use '' | + | You can further |
SP instances can be created for three different "data directions": | SP instances can be created for three different "data directions": | ||
Line 112: | Line 116: | ||
| " | | " | ||
| "input policy" | | "input policy" | ||
- | | " | + | | " |
- | + | ||
- | Due to the fact that IPsec is a mandatory part of the IPv6 protocol (but is | + | |
- | also available for the IPv4 protocol), the implementation of the Xfrm | + | |
- | framework or the "IPsec stack" is very interwoven with the implementation of | + | |
- | the IPv4 and IPv6 protocols in the kernel which makes things very complex when | + | |
- | you look into details. I saw statements which claim that the Xfrm | + | |
- | framework is the most complex part of the entire network stack in the Linux | + | |
- | kernel. | + | |
- | + | ||
- | So, what I describe in the following is a " | + | |
- | sufficient model to understand how the network packet flow works and how the | + | |
- | Xfrm framework relates to the Netfilter framework (Netfilter | + | |
- | and Xfrm are implemented independently from each other in the kernel!). | + | |
If you are working with Nftables or Iptables, then you probably are | If you are working with Nftables or Iptables, then you probably are | ||
Line 131: | Line 122: | ||
[[: | [[: | ||
which illustrates the packet flow through the Netfilter hooks and | which illustrates the packet flow through the Netfilter hooks and | ||
- | Iptables //chains//. One great thing about this image is, that it covers | + | Iptables //chains//. One great thing about this image is, that it also covers |
- | the Xfrm framework, too (at least from the bird's eye view). It illustrates | + | the Xfrm framework. It illustrates four distinct |
- | four distinct Xfrm "decision | + | in the network packet flow path, named // |
- | points" | + | //xfrm lookup// |
- | because | + | kind-of a bird's eye view. These four "actions" |
- | the network | + | actual Xfrm implementation very closely. The actual framework |
- | to the Iptables //chains//. Here I created | + | works a little bit different, which means that there actually are |
- | which only shows the Netfilter hooks (blue boxes) | + | more than four points within |
- | " | + | action and also the locations of those are a little bit different. |
- | differences between Iptables and Nftables. Because my focus is | + | Figure {{ref> |
- | on the Xfrm framework | + | with main focus on Netfilter and the Xfrm framework. |
- | not shown in the original | + | It shows the Netfilter hooks in blue color and the locations where |
- | kernel source code: | + | the Xfrm framework |
- | + | If you are not yet familiar with the Netfilter hooks and their relation | |
- | {{: | + | to Nftables/ |
+ | Netfilter | ||
- | I'll explain the Xfrm " | + | <figure nfhooksxfrm1> |
- | are already familiar with the Netfilter hooks which I covered in | + | {{: |
- | much detail in my other article | + | < |
- | [[nftables_packet_flow_netfilter_hooks_detail|Nftables | + | </ |
- | Netfilter hooks in detail]] | + | |
+ | | < | ||
+ | It is an instance of two combined structs, the outer '' | ||
+ | | < | ||
+ | </ | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | they must match the "input policy" | ||
+ | " | ||
+ | ESP packets anyway circumvent this action, as you can see in Figure {{ref> | ||
+ | | < | ||
+ | they must match the " | ||
- | | //Xfrm lookup// | This is where the SPD is used to check if the traversing packets are matching to any " | ||
- | | //Xfrm encode// | This is where packets which shall travel through the VPN tunnel are being encrypted and encapsulated based on SA instances within the SAD (which SA to use and how it relates to SP instances is regulated via the integer identifiers like '' | ||
- | | // | ||
- | | //Xfrm decode// | This is where packets which have been received through the VPN tunnel are being decrypted and decapsulated based on SA instances within the SAD. | | ||
- | | (//Xfrm fwd lookup//) | This step is not shown in the [[: | ||
- | It is a very important to mention that the Xfrm framework implementation does NOT use virtual network interfaces to distinguish between VPN and non-VPN traffic. This is a relevant difference compared to other | + | The Xfrm framework implementation does NOT use virtual network interfaces to distinguish between VPN and non-VPN traffic. This is a relevant difference compared to other implementations like the older //KLIPS// IPsec stack which was used in kernel v2.4 and earlier. Why is this relevant? It is true that virtual network interfaces are not required, because the concept of the SPs does all the distinction which is required for the VPN to operate. However, the absence of virtual network interfaces makes it harder for Netfilter-based packet filtering systems like Iptables and Nftables to distinguish between VPN and non-VPN packets within their rules. |
- | implementations like the older //KLIPS// IPsec stack which was used in kernel v2.4 and earlier. Why is this relevant? It is true that virtual network interfaces are not required, because the concept of the SPs | + | |
- | does all the distinction which is required for the VPN to operate. However, the absence of virtual network | + | |
- | interfaces makes it harder for Netfilter-based packet filtering systems like Iptablesand | + | |
It is obvious that an Nftables rule would be easy to write if all VPN traffic goes through a virtual network interface e.g. called '' | It is obvious that an Nftables rule would be easy to write if all VPN traffic goes through a virtual network interface e.g. called '' | ||
by default. Additional features have been developed over the years to address this problem. Some of them | by default. Additional features have been developed over the years to address this problem. Some of them | ||
re-introduce the concept of virtual network interfaces "on top" of the Xfrm framework, but those | re-introduce the concept of virtual network interfaces "on top" of the Xfrm framework, but those | ||
- | are optional to use and never became the default. The Strongswan documentation calls VPN setups based on those virtual network interfaces [[https:// | + | are optional to use and never became the default. The Strongswan documentation calls VPN setups based on those virtual network interfaces [[https:// |
+ | <figure xfrm_dst> | ||
+ | {{: | ||
+ | < | ||
+ | (click to enlarge). In IPsec tunnel-mode, | ||
+ | references to IPsec SA and SP and function pointers to lead the packet | ||
+ | on the Xfrm encrypt+encapsulate path. Compare it to a normal | ||
+ | //routing decision// object, which I described in my | ||
+ | [[routing_decisions_in_the_linux_kernel_1_lookup_packet_flow# | ||
+ | </ | ||
+ | </ | ||
===== Example Site-to-site VPN ===== | ===== Example Site-to-site VPN ===== | ||
- | It is better | + | <figure ipsecsstopo1> |
- | The router '' | + | {{ : |
+ | < | ||
+ | </ | ||
+ | </ | ||
- | {{ :linux:site-to-site-topo1.png? | + | It is better to have a practical example as basis for further diving into the topic. Here I will use a site-to-site |
+ | It can be roughly compared to the [[https:// | ||
+ | The VPN tunnel will connect the local subnets behind '' | ||
- | This setup can be roughly compared to the [[https:// | ||
- | |||
- | * [[: | ||
* [[: | * [[: | ||
- | Execute command '' | + | <figure swanctl_conf_r1_r2> |
- | databases inside the kernel. In the example | + | <WRAP group>< |
+ | <code json r1: swanctl.conf> | ||
+ | connections { | ||
+ | gw-gw { | ||
+ | local_addrs | ||
+ | remote_addrs = 9.0.0.1 | ||
+ | local { | ||
+ | auth = psk | ||
+ | id = r1 | ||
+ | } | ||
+ | remote { | ||
+ | auth = psk | ||
+ | id = r2 | ||
+ | } | ||
+ | children { | ||
+ | net-net { | ||
+ | mode = tunnel | ||
+ | local_ts | ||
+ | remote_ts = 192.168.2.0/ | ||
+ | esp_proposals = aes128gcm128 | ||
+ | } | ||
+ | } | ||
+ | version = 2 | ||
+ | mobike = no | ||
+ | reauth_time = 10800 | ||
+ | proposals = aes128-sha256-modp3072 | ||
+ | } | ||
+ | } | ||
+ | secrets { | ||
+ | ike-1 { | ||
+ | id-1 = r1 | ||
+ | id-2 = r2 | ||
+ | secret = " | ||
+ | } | ||
+ | }</code>< | ||
+ | <code json r2: swanctl.conf> | ||
+ | connections { | ||
+ | gw-gw { | ||
+ | local_addrs | ||
+ | remote_addrs = 8.0.0.1 | ||
+ | local { | ||
+ | auth = psk | ||
+ | id = r2 | ||
+ | } | ||
+ | remote { | ||
+ | auth = psk | ||
+ | id = r1 | ||
+ | } | ||
+ | children { | ||
+ | net-net { | ||
+ | mode = tunnel | ||
+ | local_ts | ||
+ | remote_ts = 192.168.1.0/ | ||
+ | esp_proposals = aes128gcm128 | ||
+ | } | ||
+ | } | ||
+ | version = 2 | ||
+ | mobike = no | ||
+ | reauth_time = 10800 | ||
+ | proposals = aes128-sha256-modp3072 | ||
+ | } | ||
+ | } | ||
+ | secrets { | ||
+ | ike-1 { | ||
+ | id-1 = r1 | ||
+ | id-2 = r2 | ||
+ | secret = " | ||
+ | } | ||
+ | }</ | ||
+ | < | ||
+ | Strongswan configuration | ||
+ | </ | ||
+ | </ | ||
- | * command '' | + | Execute |
- | * command | + | databases inside |
- | By the way (just for completeness): To tear the the VPN tunnel down again, you would need to execute command | + | <figure r1_ip_xfrm_state> |
+ | <code bash> | ||
+ | root@r1:~# ip xfrm state | ||
+ | src 8.0.0.1 dst 9.0.0.1 | ||
+ | proto esp spi 0xc5400599 reqid 1 mode tunnel | ||
+ | replay-window 0 flag af-unspec | ||
+ | aead rfc4106(gcm(aes)) 0x8849c107d9f6972da27a5faef554a68b10f3b938 128 | ||
+ | anti-replay context: seq 0x0, oseq 0x9, bitmap 0x00000000 | ||
+ | src 9.0.0.1 dst 8.0.0.1 | ||
+ | proto esp spi 0xcd7dff80 reqid 1 mode tunnel | ||
+ | replay-window 32 flag af-unspec | ||
+ | aead rfc4106(gcm(aes)) 0x3c0497d489904175bdb446f3e09ae4c3acaf5d45 128 | ||
+ | anti-replay context: seq 0x9, oseq 0x0, bitmap 0x000001ff | ||
+ | </ | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | <figure r1_ip_xfrm_policy> | ||
+ | <code bash> | ||
+ | root@r1:~# ip xfrm policy | ||
+ | src 192.168.1.0/ | ||
+ | dir out priority 375423 ptype main | ||
+ | tmpl src 8.0.0.1 dst 9.0.0.1 | ||
+ | proto esp spi 0xc5400599 reqid 1 mode tunnel | ||
+ | src 192.168.2.0/ | ||
+ | dir fwd priority 375423 ptype main | ||
+ | tmpl src 9.0.0.1 dst 8.0.0.1 | ||
+ | proto esp reqid 1 mode tunnel | ||
+ | src 192.168.2.0/ | ||
+ | dir in priority 375423 ptype main | ||
+ | tmpl src 9.0.0.1 dst 8.0.0.1 | ||
+ | proto esp reqid 1 mode tunnel | ||
+ | src 0.0.0.0/0 dst 0.0.0.0/0 | ||
+ | socket in priority 0 ptype main | ||
+ | src 0.0.0.0/0 dst 0.0.0.0/0 | ||
+ | socket out priority 0 ptype main | ||
+ | src 0.0.0.0/0 dst 0.0.0.0/0 | ||
+ | socket in priority 0 ptype main | ||
+ | src 0.0.0.0/0 dst 0.0.0.0/0 | ||
+ | socket out priority 0 ptype main | ||
+ | src ::/0 dst ::/0 | ||
+ | socket in priority 0 ptype main | ||
+ | src ::/0 dst ::/0 | ||
+ | socket out priority 0 ptype main | ||
+ | src ::/0 dst ::/0 | ||
+ | socket in priority 0 ptype main | ||
+ | src ::/0 dst ::/0 | ||
+ | socket out priority 0 ptype main | ||
+ | </ | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | Just for completeness: | ||
'' | '' | ||
===== Packet Flow ===== | ===== Packet Flow ===== | ||
- | Let's say the VPN tunnel in the example described above is now up and running. To start simple, let's also assume that we have not yet configured any Nftables //ruleset// on the VPN gateways '' | + | Let's say the VPN tunnel in the example described above is now up and running. To start simple, let's also assume that we have not yet configured any Nftables //ruleset// on the VPN gateways '' |
- | + | ||
- | This means, packets which are traversing one of the VPN gateways '' | + | |
- | {{ :linux:r1-traversal1.png? | + | This means, packets which are traversing one of the VPN gateways '' |
+ | <figure r1_icmp> | ||
+ | {{: | ||
+ | \\ | ||
<code bash> | <code bash> | ||
h1$ ping -c1 192.168.2.100 | h1$ ping -c1 192.168.2.100 | ||
Line 205: | Line 337: | ||
# 192.168.1.100 <- ICMP echo-reply | # 192.168.1.100 <- ICMP echo-reply | ||
</ | </ | ||
+ | < | ||
+ | </ | ||
- | The content of the following | + | The following |
+ | {{ref> | ||
+ | and the corresponding ICMP // | ||
+ | '' | ||
+ | result of me doing a lot of experimenting and reading in the kernel source code. | ||
+ | I used the Nftables | ||
+ | chain traversal, and thereby Netfilter | ||
+ | used '' | ||
+ | takes through the kernel network stack while being encrypted/decrypted. | ||
+ | On some occasions | ||
+ | breakpoints within | ||
+ | observe | ||
+ | While reading source code, the book [[https:// | ||
+ | help to me to find orientation within the kernel network stack. I hope | ||
+ | this gives you a head start in case you intend to dive deep into that topic | ||
+ | yourself, too. | ||
- | ==== ICMP echo-request h1 -> h2, r1 traversal ==== | + | <figure echo_request_r1_traversal> |
- | {{:linux:nf-hooks-xfrm-encode1.png? | + | {{:linux:packet-flow-ipsec-tunnel-encrypt.png? |
+ | < | ||
+ | </ | ||
+ | ^ Step ^^ Encapsulation | ||
+ | | 1 | **eth0** | ||
+ | | 2 | < | ||
+ | | 3 | < | ||
+ | | 4 | < | ||
+ | | 5 | < | ||
+ | | 6 | **Routing** | ||
+ | | 7 | < | ||
+ | | 8 | < | ||
+ | | 9 | < | ||
+ | | 10 | < | ||
+ | | 11 | < | ||
+ | | 12 | < | ||
+ | | 13 | < | ||
+ | | 14 | < | ||
+ | | 15 | < | ||
+ | | 16 | < | ||
+ | | 17 | **eth1** | < | ||
- | ^ ^ Netfilter / Xfrm ^ Encapsulation | ||
- | | 1 | < | ||
- | | 2 | **Routing** | ||
- | | 3 | < | ||
- | | 4 | < | ||
- | | 5 | < | ||
- | | 6 | < | ||
- | | 7 | < | ||
- | | 8 | < | ||
- | | 9 | < | ||
- | | 10 | < | ||
+ | __Steps (1) (2) (3) (4) (5):__ The //ICMP echo-request// | ||
- | * (1), (2), (3), (4), (5):\\ The //ICMP echo-request// | + | __Step |
- | * (6), (7):\\ The //ICMP echo-request// | + | needs to be forwarded |
- | * (8), (9), (10):\\ The resulting packet now traverses the (8) //Output// and (9) // | + | decision to the packet. |
- | + | ||
- | Important | + | |
- | ==== ICMP echo-reply h2 -> h1, r1 traversal ==== | + | __Step (7):__ The Xfrm framework performs a lookup into the IPsec SPD, |
- | {{: | + | searching for a matching //forward policy// ('' |
+ | found and the packet passes. | ||
- | ^ ^ Netfilter / Xfrm ^ Encapsulation | + | __Step (8):__ The Xfrm framework performs a lookup |
- | | 1 | < | + | searching for a matching |
- | | 2 | **Routing** | + | source and destination IP addresses |
- | | 3 | < | + | matches, see Figure {{ref>r1_ip_xfrm_policy}}. |
- | | 4 | < | + | An IPsec SA is resolved (see Figure {{ref>r1_ip_xfrm_state}}), |
- | | 5 | < | + | the matching SP. The Xfrm framework detects, that tunnel-mode is configured in |
- | | 6 | < | + | this SA. Thus, it now performs yet another routing lookup, this time for the (future) outer |
- | | 7 | **Routing** | + | IPv4 packet, which will later encapsulate the current packet. A “bundle” of |
- | | 8 | < | + | transformation instructions for this packet is assembled, which contains the |
- | | 9 | < | + | original routing decision from step (6), the SP, the SA, the routing decision |
- | | 10 | < | + | for the future outer IP packet and more. It is attached to the packet, |
- | | 11 | < | + | replacing the attached routing decision from step (6). |
+ | __Steps (9) (10):__ The packet traverses the Netfilter //Forward// and | ||
+ | // | ||
+ | __Step (11):__ The Xfrm framework transforms the packet | ||
+ | according to the instructions in the attached “bundle”. In this case this | ||
+ | means encapsulating the IP packet into a new outer IP packet | ||
+ | with source IP address '' | ||
+ | and then encapsulating the inner IP packet into ESP protocol and encrypting it and its | ||
+ | payload. After that the transformation instructions are removed from the | ||
+ | “bundle”, | ||
+ | attached to the packet. | ||
- | * (1), (2), (3):\\ The //ICMP echo-reply// sent from '' | + | __Steps |
- | * (4), (5):\\ The packet traverses the // | + | output path. It traverses the Netfilter |
- | * (6), (7), (8):\\ The packet is now re-inserted in the input path of network interface '' | + | Netfilter |
- | * (9), (10), (11):\\ The packet now traverses the (9) //Forward// and the (10) // | + | subsystem// which in this case resolves |
+ | '' | ||
+ | (by doing ARP lookup, if address not yet in cache). | ||
+ | Finally, it traverses the //egress// queueing discipline | ||
+ | scheduler, | ||
+ | then is sent out on '' | ||
+ | The output interface '' | ||
- | Important | + | <figure echo_reply_r1_traversal> |
+ | {{: | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | ^ Step ^^ Encapsulation | ||
+ | | 1 | **eth1** | < | ||
+ | | 2 | < | ||
+ | | 3 | < | ||
+ | | 4 | < | ||
+ | | 5 | < | ||
+ | | 6 | **Routing** | ||
+ | | 7 | < | ||
+ | | 8 | < | ||
+ | | 9 | **eth1** | < | ||
+ | | 10 | < | ||
+ | | 11 | < | ||
+ | | 12 | < | ||
+ | | 13 | < | ||
+ | | 14 | **Routing** | ||
+ | | 15 | < | ||
+ | | 16 | < | ||
+ | | 17 | < | ||
+ | | 18 | < | ||
+ | | 19 | < | ||
+ | | 20 | < | ||
+ | | 21 | < | ||
+ | | 22 | **eth0** | < | ||
+ | |||
+ | |||
+ | __Steps (1) (2) (3) (4) (5):__ The //ICMP echo-reply// | ||
+ | |||
+ | __Steps (6) (7):__ The routing lookup is performed. In this case here, the routing subsystem determines that this packets destination IP '' | ||
+ | |||
+ | __Steps (8) (9):__ The Xfrm framework has a layer 4 receive handler waiting for incoming ESP | ||
+ | packets at this point. It parses the SPI value from the ESP header of the | ||
+ | packet and performs a lookup into the SAD for a matching IPsec SA (lookup | ||
+ | based on SPI and destination IP address). A matching SA is found, see Figure | ||
+ | {{ref> | ||
+ | specifies the further steps to take here for this packet. It is decrypted and | ||
+ | the ESP header is decapsulated. Now the internal IP packet becomes visible. | ||
+ | The SA specifies tunnel-mode, | ||
+ | of packet meta data is changed here, e.g. the attached routing decision (of | ||
+ | the outer IP packet, which is now removed) is stripped away, the reference to | ||
+ | connection tracking is removed, and a pointer to the SA which has been used | ||
+ | here to transform the packet is attached (via skb extension | ||
+ | '' | ||
+ | recognize | ||
+ | now re-inserted into the OSI layer 2 receive path of '' | ||
+ | |||
+ | __Steps (10) (11) (12) (13):__ Now history repeats... the packet once again traverses | ||
+ | //taps//, the //ingress// queueing discipline and the Netfilter //Ingress// hook of | ||
+ | '' | ||
+ | |||
+ | __Step (14):__ The routing lookup is performed. It determines that this | ||
+ | packet needs to be forwarded and sent out on '' | ||
+ | to the packet. | ||
+ | |||
+ | __Step (15):__ The Xfrm framework recognizes, that this packet has been transformed | ||
+ | according to the SA, whose pointer is still attached to the packet | ||
+ | It checks if a '' | ||
+ | a match is found here, see Figure {{ref> | ||
+ | |||
+ | __Step (16):__ The Xfrm framework performs a lookup into the IPsec SPD, searching for a matching | ||
+ | //output policy// ('' | ||
+ | The idea of the //output policy// is to detect packets which shall be encrypted with IPsec. | ||
+ | Packets which do not match, just pass. | ||
+ | |||
+ | __Steps (17) (18) (19) (20) (21) (22):__ The packet traverses the Netfilter | ||
+ | //Forward// and // | ||
+ | subsystem// which does resolve the destination IP address, which is now | ||
+ | '' | ||
+ | in cache). Finally, it traverses the //egress// queueing discipline, //taps// | ||
+ | and then is sent out on '' | ||
+ | |||
+ | The input interface | ||
+ | This is what it means that the Xfrm framework does not use virtual network interfaces. If virtual network interfaces would instead be used here (e.g. a '' | ||
===== SNAT, Nftables ===== | ===== SNAT, Nftables ===== | ||
Now to add the SNAT behavior to '' | Now to add the SNAT behavior to '' | ||
Line 274: | Line 524: | ||
and '' | and '' | ||
- | However, let's take another look at the [[#ICMP echo-request h1 -> h2, r1 traversal]] example above | + | However, let's take another look at the example from Figure {{ref> |
- | and examine how the behavior differs now: In step (5) in the example the still unencrypted ICMP echo-request packet traverses the Netfilter // | + | Resulting from that, this packet now does not match the IPsec //output policy// anymore. Thus, it won't get encrypted+encapsulated! Obviously that is not our intended behavior, but let's first dig deeper to understand what actually happens here: In step (8) this packet still had its original source |
- | not know the route to the target subnet and even if it did, '' | + | |
- | because it also is now configured as SNAT router and thereby drops incoming '' | + | Ok, now we understand it ... the ping is natted, but then sent out plain and unencrypted. That is not what we want. Further, this ping is now anyway doomed to fail, because '' |
- | in the '' | + | |
How to fix that? It is our intended behavior, that network packets from subnet | How to fix that? It is our intended behavior, that network packets from subnet | ||
Line 300: | Line 549: | ||
The complete rulesets then look like this: [[: | The complete rulesets then look like this: [[: | ||
+ | Let's look at the example from Figure {{ref> | ||
- | Let's look at the [[#ICMP echo-request h1 -> h2, r1 traversal]] example again: In step (5) when traversing the // | + | When the ICMP echo-request packet is received by and traverses '' |
- | + | ||
- | When the ICMP echo-request packet is received by and traverses '' | + | |
===== Distinguish VPN/non-VPN traffic ===== | ===== Distinguish VPN/non-VPN traffic ===== | ||
No matter if your idea is to do NAT or your intentions are other kinds of packet manipulation or packet filtering, it all boils down to distinguishing between VPN and non-VPN traffic. | No matter if your idea is to do NAT or your intentions are other kinds of packet manipulation or packet filtering, it all boils down to distinguishing between VPN and non-VPN traffic. | ||
- | The Nftables rulesets I applied to '' | + | The Nftables rulesets I applied to '' |
- | The way described here does not address | + | As I mentioned above, things would be easier, if the Xfrm framework would use virtual network interfaces, because those then could serve as basis for making this distinction. |
Several means have been implemented to address those kind of issues: | Several means have been implemented to address those kind of issues: | ||
- | * trongswan | + | * Strongswan |
- | * Nftables offers IPSEC EXPRESSIONS (syntax '' | + | * Nftables offers IPSEC EXPRESSIONS (syntax '' |
* Nftables offers " | * Nftables offers " | ||
* So-called '' | * So-called '' | ||
- | |||
- | I am planning to describe some of those means in more detail in another article, however I still need to write that one. ;-) I'll place a link here once I find the time to write it. | ||
Line 325: | Line 571: | ||
Debian 10 (buster) system with using Debian // | Debian 10 (buster) system with using Debian // | ||
- | * kernel: '' | + | * kernel: '' |
- | * nftables: '' | + | * nftables: '' |
- | * libnftnl: '' | + | * strongswan: '' |
- | * strongswan: '' | + | |
===== Feedback ===== | ===== Feedback ===== | ||
- | [[: | + | [[: |
+ | |||
+ | ===== References ===== | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | //published 2020-05-30//, | ||
- | {{tag> |
blog/linux/nftables_ipsec_packet_flow.1592986044.txt.gz · Last modified: 2020-06-24 by Andrej Stender