blog:linux:nftables_ipsec_packet_flow
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
blog:linux:nftables_ipsec_packet_flow [2021-06-29] – added publishing date Andrej Stender | blog:linux:nftables_ipsec_packet_flow [2022-08-14] (current) – added details about xfrm bundle Andrej Stender | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | {{tag> | + | {{tag> |
====== Nftables - Netfilter and VPN/IPsec packet flow ====== | ====== Nftables - Netfilter and VPN/IPsec packet flow ====== | ||
~~META: | ~~META: | ||
date created = 2020-05-30 | date created = 2020-05-30 | ||
~~ | ~~ | ||
- | |||
- | ~~NOTOC~~ | ||
In this article I like to explain how the packet flow through | In this article I like to explain how the packet flow through | ||
Line 31: | Line 29: | ||
so-called //Security Associations// | so-called //Security Associations// | ||
- | In addition to the SAs, IPsec also introduces the concept of the so-called //Security Policies// (SPs), which are also created during IKE handshake. Those are either defined by the IPsec tunnel configuration provided by the admin/user and/or (depending on case) can also at least partly result from dynamic IKE negotiation. The purpose of the SPs is to act as " | + | In addition to the SAs, IPsec also introduces the concept of the so-called //Security Policies// (SPs), which are also created during IKE handshake. Those are either defined by the IPsec tunnel configuration provided by the admin/user and/or (depending on case) can also at least partly result from dynamic IKE negotiation. The purpose of the SPs is to act as " |
Be aware that both SAs and SPs merely are volatile and not persistent data. Their lifetime is defined by the lifetime of the VPN tunnel connection. It might even be shorter because of key re-negotiations / " | Be aware that both SAs and SPs merely are volatile and not persistent data. Their lifetime is defined by the lifetime of the VPN tunnel connection. It might even be shorter because of key re-negotiations / " | ||
Line 39: | Line 37: | ||
| < | | < | ||
- | | < | + | | < |
If // | If // | ||
| < | | < | ||
- | | < | + | | < |
Line 54: | Line 52: | ||
{{: | {{: | ||
< | < | ||
- | Block diagram showing userspace and kernel part of IPsec implementation | + | Block diagram showing userspace and kernel part of the IPsec implementation |
- | on Linux (StrongSwan and Xfrm framework) and interfaces between both. | + | on Linux (StrongSwan and Xfrm framework) and interfaces between both |
</ | </ | ||
</ | </ | ||
Line 84: | Line 82: | ||
which then feeds this config into the //charon// daemon via the //Vici// IPC interface. | which then feeds this config into the //charon// daemon via the //Vici// IPC interface. | ||
Further, the syntax of '' | Further, the syntax of '' | ||
- | of '' | + | of '' |
An additional config file ''/ | An additional config file ''/ | ||
Line 92: | Line 90: | ||
==== The Xfrm framework ==== | ==== The Xfrm framework ==== | ||
The so-called //Xfrm framework// is a component within the Linux kernel. | The so-called //Xfrm framework// is a component within the Linux kernel. | ||
- | As the man page '' | + | As one of the // |
- | (such as encrypting their payloads)"// | + | it is an //"IP framework for transforming packets |
+ | (such as encrypting their payloads)"// | ||
While the userspace part (Strongswan) handles the overall IPsec orchestration and | While the userspace part (Strongswan) handles the overall IPsec orchestration and | ||
runs the IKEv1/IKEv2 protocol to buildup/ | runs the IKEv1/IKEv2 protocol to buildup/ | ||
the kernel part is responsible for encrypting+encapsulating and decrypting+decapsulating | the kernel part is responsible for encrypting+encapsulating and decrypting+decapsulating | ||
network packets which travel through the VPN tunnel and to select/ | network packets which travel through the VPN tunnel and to select/ | ||
- | go through the VPN tunnel at all. To do that, it requires | + | go through the VPN tunnel at all. To do that, it requires all SA and SP |
- | all SA and SP instances which define the VPN tunnel/ | + | instances which define the VPN tunnel/ |
- | be present | + | Only then it can make decisions on which packet shall be encrypted/ |
- | make decisions on which packet shall be encrypted/ | + | and which not and which encryption algorithms and keys to use. |
- | and which encryption algorithms and keys to use. | + | |
The Xfrm framework implements the so-called //Security Association Database// (SAD) | The Xfrm framework implements the so-called //Security Association Database// (SAD) | ||
- | and the //Security Policy Database// (SPD) for holding SA and SP instances in the kernel. | + | and the //Security Policy Database// (SPD) for holding SA and SP instances. |
- | Userspace components (like // | + | An SA is represented by '' |
+ | Userspace components (like // | ||
* command '' | * command '' | ||
* command '' | * command '' | ||
- | You can even use '' | + | You can further |
SP instances can be created for three different "data directions": | SP instances can be created for three different "data directions": | ||
Line 117: | Line 116: | ||
| " | | " | ||
| "input policy" | | "input policy" | ||
- | | " | + | | " |
- | + | ||
- | Due to the fact that IPsec is a mandatory part of the IPv6 protocol (but is | + | |
- | also available for the IPv4 protocol), the implementation of the Xfrm | + | |
- | framework or the "IPsec stack" is very interwoven with the implementation of | + | |
- | the IPv4 and IPv6 protocols in the kernel which makes things very complex when | + | |
- | you look into details. I saw statements which claim that the Xfrm | + | |
- | framework is the most complex part of the entire network stack in the Linux | + | |
- | kernel. | + | |
- | + | ||
- | So, what I describe in the following is a " | + | |
- | sufficient model to understand how the network packet flow works and how the | + | |
- | Xfrm framework relates to the Netfilter framework (Netfilter | + | |
- | and Xfrm are implemented independently from each other in the kernel!). | + | |
If you are working with Nftables or Iptables, then you probably are | If you are working with Nftables or Iptables, then you probably are | ||
Line 136: | Line 122: | ||
[[: | [[: | ||
which illustrates the packet flow through the Netfilter hooks and | which illustrates the packet flow through the Netfilter hooks and | ||
- | Iptables //chains//. One great thing about this image is, that it covers | + | Iptables //chains//. One great thing about this image is, that it also covers |
- | the Xfrm framework, too (at least from the bird's eye view). It illustrates | + | the Xfrm framework. It illustrates four distinct |
- | four distinct Xfrm "decision | + | in the network packet flow path, named // |
- | points" | + | //xfrm lookup// |
- | because | + | kind-of a bird's eye view. These four "actions" |
- | the network | + | actual Xfrm implementation very closely. The actual framework |
- | to the Iptables //chains//. In Figure {{ref> | + | works a little bit different, which means that there actually are |
- | version | + | more than four points within |
- | " | + | action and also the locations of those are a little bit different. |
- | differences between Iptables and Nftables. Because my focus is | + | Figure {{ref> |
- | on the Xfrm framework | + | with main focus on Netfilter and the Xfrm framework. |
- | not shown in the original | + | It shows the Netfilter hooks in blue color and the locations where |
- | kernel source code: | + | the Xfrm framework |
+ | If you are not yet familiar with the Netfilter hooks and their relation | ||
+ | to Nftables/ | ||
+ | Netfilter hooks in detail]] before proceeding here. | ||
<figure nfhooksxfrm1> | <figure nfhooksxfrm1> | ||
- | {{:linux:nf-hooks-xfrm1.png? | + | {{:linux:packet-flow-ipsec-tunnel.png? |
- | < | + | < |
</ | </ | ||
- | I'll explain | + | | < |
- | are already familiar with the Netfilter | + | It is an instance of two combined structs, the outer '' |
- | much detail in my other article | + | | < |
- | [[nftables_packet_flow_netfilter_hooks_detail|Nftables | + | </ |
- | Netfilter hooks in detail]] (if not, please read that article first). | + | | < |
+ | | < | ||
+ | | < | ||
+ | they must match the "input policy" | ||
+ | " | ||
+ | ESP packets anyway circumvent this action, as you can see in Figure {{ref> | ||
+ | | < | ||
+ | they must match the " | ||
- | | //Xfrm lookup// | This is where the SPD is used to check if the traversing packets are matching to any " | + | The Xfrm framework implementation does NOT use virtual network interfaces to distinguish between VPN and non-VPN traffic. This is a relevant difference compared to other implementations like the older //KLIPS// IPsec stack which was used in kernel v2.4 and earlier. Why is this relevant? It is true that virtual network interfaces are not required, because the concept of the SPs does all the distinction which is required for the VPN to operate. However, the absence of virtual network interfaces makes it harder for Netfilter-based packet filtering systems like Iptables and Nftables to distinguish between VPN and non-VPN packets within their rules. |
- | | //Xfrm encode// | This is where packets which shall travel through the VPN tunnel are being encrypted and encapsulated based on SA instances within the SAD (which SA to use and how it relates to SP instances is regulated via the integer identifiers like '' | + | |
- | | // | + | |
- | | //Xfrm decode// | This is where packets which have been received through the VPN tunnel are being decrypted and decapsulated based on SA instances within the SAD. | | + | |
- | | (//Xfrm fwd lookup//) | This step is not shown in the [[: | + | |
- | + | ||
- | It is very important to mention that the Xfrm framework implementation does NOT use virtual network interfaces to distinguish between VPN and non-VPN traffic. This is a relevant difference compared to other | + | |
- | implementations like the older //KLIPS// IPsec stack which was used in kernel v2.4 and earlier. Why is this relevant? It is true that virtual network interfaces are not required, because the concept of the SPs | + | |
- | does all the distinction which is required for the VPN to operate. However, the absence of virtual network | + | |
- | interfaces makes it harder for Netfilter-based packet filtering systems like Iptables and Nftables to distinguish between VPN and non-VPN packets within their rules. | + | |
It is obvious that an Nftables rule would be easy to write if all VPN traffic goes through a virtual network interface e.g. called '' | It is obvious that an Nftables rule would be easy to write if all VPN traffic goes through a virtual network interface e.g. called '' | ||
Line 178: | Line 165: | ||
are optional to use and never became the default. The Strongswan documentation calls VPN setups based on those virtual network interfaces [[https:// | are optional to use and never became the default. The Strongswan documentation calls VPN setups based on those virtual network interfaces [[https:// | ||
+ | <figure xfrm_dst> | ||
+ | {{: | ||
+ | < | ||
+ | (click to enlarge). In IPsec tunnel-mode, | ||
+ | references to IPsec SA and SP and function pointers to lead the packet | ||
+ | on the Xfrm encrypt+encapsulate path. Compare it to a normal | ||
+ | //routing decision// object, which I described in my | ||
+ | [[routing_decisions_in_the_linux_kernel_1_lookup_packet_flow# | ||
+ | </ | ||
+ | </ | ||
===== Example Site-to-site VPN ===== | ===== Example Site-to-site VPN ===== | ||
<figure ipsecsstopo1> | <figure ipsecsstopo1> | ||
- | {{ : | + | {{ : |
- | < | + | < |
</ | </ | ||
</ | </ | ||
- | It is better to have a practical example as basis for further diving into the topic. Here I will use a site-to-site VPN setup, which is created between two VPN gateways '' | + | It is better to have a practical example as basis for further diving into the topic. Here I will use a site-to-site VPN setup, which is created between two VPN gateways '' |
+ | It can be roughly compared to the [[https:// | ||
+ | The VPN tunnel will connect the local subnets behind '' | ||
* [[: | * [[: | ||
Line 261: | Line 260: | ||
}</ | }</ | ||
< | < | ||
- | Strongswan configuration on '' | + | Strongswan configuration on '' |
</ | </ | ||
</ | </ | ||
Line 341: | Line 340: | ||
</ | </ | ||
- | The content of the following | + | The following Figures {{ref> |
+ | {{ref> | ||
+ | and the corresponding ICMP // | ||
+ | '' | ||
+ | result of me doing a lot of experimenting and reading in the kernel source code. | ||
+ | I used the Nftables | ||
+ | chain traversal, and thereby Netfilter | ||
+ | used '' | ||
+ | takes through the kernel network stack while being encrypted/decrypted. | ||
+ | On some occasions | ||
+ | breakpoints within | ||
+ | observe | ||
+ | While reading source code, the book [[https:// | ||
+ | help to me to find orientation within the kernel network stack. I hope | ||
+ | this gives you a head start in case you intend to dive deep into that topic | ||
+ | yourself, too. | ||
<figure echo_request_r1_traversal> | <figure echo_request_r1_traversal> | ||
- | {{:linux:nf-hooks-xfrm-encode1.png? | + | {{:linux:packet-flow-ipsec-tunnel-encrypt.png? |
+ | < | ||
+ | </ | ||
- | ^ ^ Netfilter / Xfrm | + | ^ Step ^^ Encapsulation |
- | | 1 | < | + | | 1 | **eth0** |
- | | 2 | **Routing** | + | | 2 | < |
- | | 3 | < | + | | 3 | < |
- | | 4 | < | + | | 4 | < |
- | | 5 | < | + | | 5 | < |
- | | 6 | < | + | | 6 | **Routing** |
- | | 7 | < | + | | 7 | < |
- | | 8 | < | + | | 8 | < |
- | | 9 | < | + | | 9 | < |
- | | 10 | < | + | | 10 | < |
+ | | 11 | < | ||
+ | | 12 | < | ||
+ | | 13 | < | ||
+ | | 14 | < | ||
+ | | 15 | < | ||
+ | | 16 | < | ||
+ | | 17 | **eth1** | < | ||
- | < | ||
- | * (1), (2), (3), (4), (5):\\ The //ICMP echo-request// | ||
- | * (6), (7):\\ The //ICMP echo-request// | ||
- | * (8), (9), (10):\\ The resulting packet now traverses the (8) //Output// and (9) // | ||
- | </ | ||
- | </ | ||
- | |||
- | Important to note regarding Figure {{ref> | ||
- | <figure echo_reply_r1_traversal> | + | __Steps (1) (2) (3) (4) (5):__ The //ICMP echo-request// packet from '' |
- | {{:linux:nf-hooks-xfrm-decode1.png? | + | |
- | ^ ^ Netfilter / Xfrm ^ Encapsulation | + | __Step (6):__ The routing |
- | | 1 | < | + | needs to be forwarded and sent out on '' |
- | | 2 | **Routing** | + | decision to the packet. |
- | | 3 | < | + | |
- | | 4 | < | + | |
- | | 5 | < | + | |
- | | 6 | < | + | |
- | | 7 | **Routing** | + | |
- | | 8 | < | + | |
- | | 9 | < | + | |
- | | 10 | < | + | |
- | | 11 | < | + | |
- | < | + | __Step |
- | * (1), (2), (3):\\ The //ICMP echo-reply// sent from '' | + | searching for a matching |
- | | + | found and the packet passes. |
- | | + | |
- | * (9), (10), (11):\\ The packet now traverses the (9) // | + | __Step |
- | </ | + | searching for a matching |
+ | source and destination IP addresses | ||
+ | matches, see Figure {{ref> | ||
+ | An IPsec SA is resolved (see Figure {{ref> | ||
+ | the matching SP. The Xfrm framework detects, that tunnel-mode is configured in | ||
+ | this SA. Thus, it now performs yet another | ||
+ | IPv4 packet, which will later encapsulate the current packet. A “bundle” of | ||
+ | transformation instructions | ||
+ | original routing decision from step (6), the SP, the SA, the routing decision | ||
+ | for the future outer IP packet and more. It is attached | ||
+ | replacing the attached routing decision from step (6). | ||
+ | |||
+ | __Steps | ||
+ | // | ||
+ | |||
+ | __Step (11):__ The Xfrm framework transforms the packet | ||
+ | according to the instructions in the attached “bundle”. In this case this | ||
+ | means encapsulating the IP packet | ||
+ | with source IP address '' | ||
+ | and then encapsulating the inner IP packet into ESP protocol | ||
+ | payload. After that the transformation instructions are removed from the | ||
+ | “bundle”, | ||
+ | attached to the packet. | ||
+ | |||
+ | __Steps | ||
+ | output | ||
+ | Netfilter // | ||
+ | subsystem// which in this case resolves the next hop gateway ip address | ||
+ | '' | ||
+ | (by doing ARP lookup, if address not yet in cache). | ||
+ | Finally, it traverses the //egress// queueing discipline | ||
+ | scheduler, | ||
+ | then is sent out on '' | ||
+ | |||
+ | The output interface '' | ||
+ | |||
+ | <figure echo_reply_r1_traversal> | ||
+ | {{: | ||
+ | < | ||
</ | </ | ||
- | Important | + | ^ Step ^^ Encapsulation |
+ | | 1 | **eth1** | < | ||
+ | | 2 | < | ||
+ | | 3 | < | ||
+ | | 4 | < | ||
+ | | 5 | < | ||
+ | | 6 | **Routing** | ||
+ | | 7 | < | ||
+ | | 8 | < | ||
+ | | 9 | **eth1** | < | ||
+ | | 10 | < | ||
+ | | 11 | < | ||
+ | | 12 | < | ||
+ | | 13 | < | ||
+ | | 14 | **Routing** | ||
+ | | 15 | < | ||
+ | | 16 | < | ||
+ | | 17 | < | ||
+ | | 18 | < | ||
+ | | 19 | < | ||
+ | | 20 | < | ||
+ | | 21 | < | ||
+ | | 22 | **eth0** | < | ||
+ | |||
+ | |||
+ | __Steps (1) (2) (3) (4) (5):__ The //ICMP echo-reply// | ||
+ | |||
+ | __Steps (6) (7):__ The routing lookup is performed. In this case here, the routing subsystem determines that this packets destination IP '' | ||
+ | |||
+ | __Steps (8) (9):__ The Xfrm framework has a layer 4 receive handler waiting for incoming ESP | ||
+ | packets at this point. It parses the SPI value from the ESP header of the | ||
+ | packet and performs a lookup into the SAD for a matching IPsec SA (lookup | ||
+ | based on SPI and destination IP address). A matching SA is found, see Figure | ||
+ | {{ref>r1_ip_xfrm_state}}, which | ||
+ | specifies the further steps to take here for this packet. It is decrypted and | ||
+ | the ESP header is decapsulated. Now the internal IP packet becomes visible. | ||
+ | The SA specifies tunnel-mode, | ||
+ | of packet meta data is changed here, e.g. the attached routing decision (of | ||
+ | the outer IP packet, which is now removed) is stripped away, the reference to | ||
+ | connection tracking is removed, and a pointer to the SA which has been used | ||
+ | here to transform the packet is attached (via skb extension | ||
+ | '' | ||
+ | recognize | ||
+ | now re-inserted into the OSI layer 2 receive path of '' | ||
+ | |||
+ | __Steps (10) (11) (12) (13):__ Now history repeats... the packet once again traverses | ||
+ | //taps//, the //ingress// queueing discipline and the Netfilter //Ingress// hook of | ||
+ | '' | ||
+ | |||
+ | __Step (14):__ The routing lookup is performed. It determines that this | ||
+ | packet needs to be forwarded and sent out on '' | ||
+ | to the packet. | ||
+ | |||
+ | __Step (15):__ The Xfrm framework recognizes, that this packet has been transformed | ||
+ | according to the SA, whose pointer is still attached to the packet | ||
+ | It checks if a '' | ||
+ | a match is found here, see Figure {{ref> | ||
+ | |||
+ | __Step (16):__ The Xfrm framework performs a lookup into the IPsec SPD, searching for a matching | ||
+ | //output policy// ('' | ||
+ | The idea of the //output policy// is to detect packets which shall be encrypted with IPsec. | ||
+ | Packets which do not match, just pass. | ||
+ | |||
+ | __Steps (17) (18) (19) (20) (21) (22):__ The packet traverses the Netfilter | ||
+ | //Forward// and // | ||
+ | subsystem// which does resolve the destination IP address, which is now | ||
+ | '' | ||
+ | in cache). Finally, it traverses the //egress// queueing discipline, //taps// | ||
+ | and then is sent out on '' | ||
+ | |||
+ | The input interface | ||
+ | This is what it means that the Xfrm framework does not use virtual network interfaces. If virtual network interfaces would instead be used here (e.g. a '' | ||
===== SNAT, Nftables ===== | ===== SNAT, Nftables ===== | ||
Now to add the SNAT behavior to '' | Now to add the SNAT behavior to '' | ||
Line 410: | Line 524: | ||
and '' | and '' | ||
- | However, let's take another look at the example from Figure {{ref> | + | However, let's take another look at the example from Figure {{ref> |
- | not know the route to the target subnet and even if it did, '' | + | Resulting from that, this packet now does not match the IPsec //output policy// anymore. Thus, it won't get encrypted+encapsulated! Obviously that is not our intended behavior, but let's first dig deeper to understand what actually happens here: In step (8) this packet still had its original source |
- | because it also is now configured as SNAT router and thereby drops incoming '' | + | |
- | in the '' | + | Ok, now we understand it ... the ping is natted, but then sent out plain and unencrypted. That is not what we want. Further, this ping is now anyway doomed to fail, because '' |
How to fix that? It is our intended behavior, that network packets from subnet | How to fix that? It is our intended behavior, that network packets from subnet | ||
Line 435: | Line 549: | ||
The complete rulesets then look like this: [[: | The complete rulesets then look like this: [[: | ||
- | Let's look at the example from Figure {{ref> | + | Let's look at the example from Figure {{ref> |
When the ICMP echo-request packet is received by and traverses '' | When the ICMP echo-request packet is received by and traverses '' | ||
Line 441: | Line 555: | ||
===== Distinguish VPN/non-VPN traffic ===== | ===== Distinguish VPN/non-VPN traffic ===== | ||
No matter if your idea is to do NAT or your intentions are other kinds of packet manipulation or packet filtering, it all boils down to distinguishing between VPN and non-VPN traffic. | No matter if your idea is to do NAT or your intentions are other kinds of packet manipulation or packet filtering, it all boils down to distinguishing between VPN and non-VPN traffic. | ||
- | The Nftables rulesets I applied to '' | + | The Nftables rulesets I applied to '' |
- | The way described here does not address | + | As I mentioned above, things would be easier, if the Xfrm framework would use virtual network interfaces, because those then could serve as basis for making this distinction. |
Several means have been implemented to address those kind of issues: | Several means have been implemented to address those kind of issues: | ||
* Strongswan provides an optional '' | * Strongswan provides an optional '' | ||
- | * Nftables offers IPSEC EXPRESSIONS (syntax '' | + | * Nftables offers IPSEC EXPRESSIONS (syntax '' |
* Nftables offers " | * Nftables offers " | ||
* So-called '' | * So-called '' | ||
- | |||
- | I am planning to describe some of those means in more detail in another article, however I still need to write that one. ;-) I'll place a link here once I find the time to write it. | ||
Line 459: | Line 571: | ||
Debian 10 (buster) system with using Debian // | Debian 10 (buster) system with using Debian // | ||
- | * kernel: '' | + | * kernel: '' |
- | * nftables: '' | + | * nftables: '' |
- | * libnftnl: '' | + | * strongswan: '' |
- | * strongswan: '' | + | |
===== Feedback ===== | ===== Feedback ===== | ||
- | [[: | + | [[: |
+ | ===== References ===== | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
- | //published 2020-05-30// | + | //published 2020-05-30//, //last modified 2022-08-14// |
blog/linux/nftables_ipsec_packet_flow.1624917862.txt.gz · Last modified: 2021-06-29 by Andrej Stender