blog:linux:nftables_packet_flow_netfilter_hooks_detail
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
blog:linux:nftables_packet_flow_netfilter_hooks_detail [2020-06-05] – [Example: NAT edge router] Andrej Stender | blog:linux:nftables_packet_flow_netfilter_hooks_detail [2022-08-07] (current) – activated TOC Andrej Stender | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | {{tag> | ||
====== Nftables - Packet flow and Netfilter hooks in detail ====== | ====== Nftables - Packet flow and Netfilter hooks in detail ====== | ||
~~META: | ~~META: | ||
Line 4: | Line 5: | ||
~~ | ~~ | ||
- | If you are using // | + | If you are using // |
- | one to be the default case nowadays) | + | |
packet filtering with IPv4, then you'll probably get enough info out of the | packet filtering with IPv4, then you'll probably get enough info out of the | ||
official documentation and by a quick look through websites which | official documentation and by a quick look through websites which | ||
provide example configurations. | provide example configurations. | ||
- | + | However, if you are working on a little bit more complex stuff like writing | |
- | However if you are working on a little bit more complex stuff like writing | + | |
// | // | ||
and doing NAT, or other of the "more interesting" | and doing NAT, or other of the "more interesting" | ||
Line 20: | Line 19: | ||
in a little more detail. | in a little more detail. | ||
+ | ===== Rationale ===== | ||
I for myself always like to know how things work and to dig a little deeper than | I for myself always like to know how things work and to dig a little deeper than | ||
just gaining the very minimum knowledge required to solve the issue at hand. | just gaining the very minimum knowledge required to solve the issue at hand. | ||
Available documentation on this topic isn't bad, but like most other documentation | Available documentation on this topic isn't bad, but like most other documentation | ||
- | it tends to leave some gaps and questions | + | it tends to leave some gaps and questions in your head unanswered |
- | is outdated. | + | the available documentation |
- | are often only covered by older articles focused on the predecessor // | + | are often only covered by older articles focused on the // |
After digging through a lot of websites, some kernel source code and doing some practical | After digging through a lot of websites, some kernel source code and doing some practical | ||
experimenting involving the //trace// and //log// features of // | experimenting involving the //trace// and //log// features of // | ||
Line 33: | Line 32: | ||
//base chains//, // | //base chains//, // | ||
to the actual network packet flow through the // | to the actual network packet flow through the // | ||
- | |||
Line 41: | Line 39: | ||
and thereby the packet flow through the //tables//, //chains// and //rules// of | and thereby the packet flow through the //tables//, //chains// and //rules// of | ||
// | // | ||
- | maintained image is the following one. | + | maintained image is shown in Figure {{ref> |
- | The original author is Jan Engelhardt and it has been published on [[https:// | + | |
- | {{:wiki: | + | <figure nfpackflowofficial> |
- | + | {{: | |
- | However what this image shows you is the packet flow though the //Netfilter hooks// and thereby the packet flow through the //tables// and //chains// like they existed in old //Iptables//. In // | + | < |
+ | </caption> | ||
+ | </figure> | ||
+ | However, what this image shows you is the packet flow though the //Netfilter hooks// and thereby the packet flow through the //tables// and //chains// like they existed in old // | ||
===== Netfilter ===== | ===== Netfilter ===== | ||
The [[wp> | The [[wp> | ||
- | It provides a bunch of //hooks// inside the Linux kernel, which are being traversed by network packets as those flow through the kernel. Other kernel components can register callback functions with those hooks, which enables them to examine the packets and to make decisions on whether packets shall be //dropped// (=deleted) or be // | + | It provides a bunch of //hooks// inside the Linux kernel, which are being traversed by network packets as those flow through the kernel. Other kernel components can register callback functions with those hooks, which enables them to examine the packets and to make decisions on whether packets shall be //dropped// (=deleted) or be // |
- | The following | + | |
- | {{ :wiki: | + | <figure nfhookssimple> |
+ | {{ : | ||
+ | < | ||
+ | </ | ||
A network packet received on a network device first traverses the // | A network packet received on a network device first traverses the // | ||
- | Those five hooks have been present in the Linux kernel for a very long time. You can e.g. already find an equivalent of the image above in the [[https:// | + | Those five hooks have been present in the Linux kernel for a very long time. You can e.g. already find an equivalent of Figure {{ref> |
- | {{:wiki: | + | <figure nfhooksdetail> |
+ | {{: | ||
+ | < | ||
+ | </ | ||
- | As you can see, those five hooks exist independently for the IPv4 and for the IPv6 protocol (meaning IPv4 and IPv6 packets each traverse their own hooks). Further hooks exist to be traversed by ARP packages | + | As you can see, those five hooks exist independently for the IPv4 and for the IPv6 protocol (meaning IPv4 and IPv6 packets each traverse their own hooks). Further hooks exist to be traversed by ARP packets |
==== Network Namespaces ==== | ==== Network Namespaces ==== | ||
Line 69: | Line 74: | ||
explicitly make use of network namespaces (e.g. by creating additional ones), | explicitly make use of network namespaces (e.g. by creating additional ones), | ||
still one instance, the default network namespace //" | still one instance, the default network namespace //" | ||
- | and then all the networking happens inside | + | and then all the networking happens inside |
All the mentioned hooks exist independently (=are being re-created) within each | All the mentioned hooks exist independently (=are being re-created) within each | ||
network namespace((The only exception here is the //ingress// hook which is bound to | network namespace((The only exception here is the //ingress// hook which is bound to | ||
- | an individual //network device// and thereby (at least not directly) to a //network namespace// | + | an individual //network device// and thereby (at least not directly) to a //network namespace// |
functions which are registered with the hooks, are re-created (initially empty) | functions which are registered with the hooks, are re-created (initially empty) | ||
- | for each new network namespace. Thus who is registered with those hooks is | + | for each new network namespace. Thus, who is registered with those hooks is |
different and individual to each network namespace. | different and individual to each network namespace. | ||
Of course the actual concept of network namespaces and its impact goes | Of course the actual concept of network namespaces and its impact goes | ||
Line 81: | Line 86: | ||
- | ==== Register | + | ==== Register |
- | As already mentioned, the idea of the hooks is to give other kernel components the opportunity to register // | + | As already mentioned, the idea of the hooks is to give other kernel components the opportunity to register // |
- | {{ :wiki: | + | <figure nfhookregister> |
+ | {{ : | ||
+ | < | ||
+ | </ | ||
- | Several callback functions can be registered with the same hook. // | + | Several callback functions can be registered with the same hook. // |
- | implemented as an instance of '' | + | In most other documentation on the Internet as well as in discussions among the Netfilter developer community, those registered callback functions are usually referred to as "hook functions" |
==== Priority ==== | ==== Priority ==== | ||
- | The sequence of callbacks | + | The sequence of hook functions |
- | /* from include/ | + | |
+ | <figure nfipv4hookpriorities> | ||
+ | <code c> | ||
enum nf_ip_hook_priorities { | enum nf_ip_hook_priorities { | ||
NF_IP_PRI_FIRST = INT_MIN, | NF_IP_PRI_FIRST = INT_MIN, | ||
Line 110: | Line 120: | ||
}; | }; | ||
</ | </ | ||
+ | < | ||
+ | Source code extract from '' | ||
+ | </ | ||
- | I go into such detail here, because this enum shows you the discrete // | + | I go into such detail here, because this enum shows you the discrete // |
==== Hard-coded vs. Flexibility ==== | ==== Hard-coded vs. Flexibility ==== | ||
- | The //Netfilter// hooks themselves are hard-coded into the Linux kernel network stack. You'll find them in the source code if you search for function calls named '' | + | The Netfilter hooks themselves are hard-coded into the Linux kernel network stack. You'll find them in the source code if you search for function calls named '' |
- | runtime and why those callbacks | + | |
- | good as yours. There are many potential reasons which might have led to these design decisions, but common sense (and comments on some websites) made at least these two reasons obvious to me: | + | |
- | + | ||
- | - For once this kind of flexibility during runtime is an essential basic requirement in a kernel where many components (also // | + | |
- | - Performance is a crucial issue. Every network packet needs to traverse all callbacks registered with a hook. Thus those callbacks should be registered in an economical way. This is probably one of the driving reasons why //base chains// in // | + | |
+ | - For once this kind of flexibility during runtime is an essential basic requirement in a kernel where many components (also // | ||
+ | - Performance is a crucial issue. Every network packet needs to traverse all hook functions registered with a Netfilter hook. Thus, those hook functions should be registered in an economical way. This is probably one of the driving reasons why //base chains// in // | ||
==== Hook traversal and verdict ==== | ==== Hook traversal and verdict ==== | ||
- | {{ : | + | Now let's take a more detailed look on how the hook functions |
- | Now let's take a more detailed look on how the callbacks | + | For each network packet which traverses this hook, the hook functions are being called one by one |
- | For each network packet which traverses this hook the callback | + | |
in the sequence/ | in the sequence/ | ||
- | the // | + | the // |
- | of '' | + | |
+ | <figure nfhookentriesflow> | ||
+ | {{ : | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | Network packets are represented within the Linux kernel as instances of '' | ||
===== Iptables ===== | ===== Iptables ===== | ||
- | To put things into context, let's take a short look at // | + | To put things into context, let's take a short look at // |
- | // | + | |
- | In case of // | + | In case of // |
^ table ^ contains chains ^ command to show that ^ | ^ table ^ contains chains ^ command to show that ^ | ||
Line 142: | Line 156: | ||
| '' | | '' | ||
- | The sequence in which the //chains// are being traversed when a packet traverses the hook (their // | + | The sequence in which the //chains// are being traversed when a packet traverses the hook (their // |
- | {{ :wiki: | + | <figure nfhookentrylegend> |
+ | {{ : | ||
+ | < | ||
+ | </ | ||
- | I additionally show the // | + | I additionally show the // |
- | The '' | + | The '' |
* '' | * '' | ||
Line 154: | Line 171: | ||
* '' | * '' | ||
- | Let's take a look at '' | + | Let's take a look at '' |
+ | |||
+ | <figure nfthooksiptables> | ||
+ | {{ : | ||
+ | < | ||
+ | </ | ||
- | {{ : | ||
- | \\ | ||
===== Connection tracking ===== | ===== Connection tracking ===== | ||
- | As you can see in the image above, the // | + | As you can see in Figure {{ref> |
===== Nftables ===== | ===== Nftables ===== | ||
In general // | In general // | ||
However, in contrast to // | However, in contrast to // | ||
- | // | + | // |
A //regular chain// is not registered with any hook (//regular chains// are not covered in this article)((The //regular chains// represent the same feature as I already mentioned for // | A //regular chain// is not registered with any hook (//regular chains// are not covered in this article)((The //regular chains// represent the same feature as I already mentioned for // | ||
- | Thus the user is not forced to name the //base chains// like the hooks they will be registered with. This obviously offers more freedom and flexibility, | + | Thus, the user is not forced to name the //base chains// like the Netfilter |
==== Address Families ==== | ==== Address Families ==== | ||
Line 181: | Line 200: | ||
As a result, all //base chains// which you create within a //table// will be registered with the specified // | As a result, all //base chains// which you create within a //table// will be registered with the specified // | ||
- | In the following example | + | |
+ | The following example | ||
<code bash> | <code bash> | ||
- | #create a new table named ' | ||
nft create table ip foo | nft create table ip foo | ||
- | + | nft create chain ip foo bar {type filter hook input priority 0\;} | |
- | #create new base chain named ' | + | |
- | #netfilter hook ' | + | |
- | #and specify priority ' | + | |
- | nft create chain ip foo bar { type filter hook input priority 0\; } | + | |
</ | </ | ||
- | === The inet family === | + | The '' |
- | The '' | + | The following example creates a table '' |
<code bash> | <code bash> | ||
nft create table inet foo | nft create table inet foo | ||
- | + | nft create chain inet foo bar {type filter hook input priority 0\;} | |
- | #this base chain will get registered with the Netfilter ' | + | |
- | #hook of IPv4 and also to the Netfilter ' | + | |
- | nft create chain inet foo bar { type filter hook input priority 0\; } | + | |
</ | </ | ||
+ | |||
==== Priority ==== | ==== Priority ==== | ||
In the examples above you already saw that // | In the examples above you already saw that // | ||
value when creating a //base chain//. This is the very same // | value when creating a //base chain//. This is the very same // | ||
in detail when covering // | in detail when covering // | ||
- | versions of // | + | versions of // |
- | analog to the mentioned //enums// in // | + | |
- | When creating a //base chain//, you can e.g. specify '' | + | |
^ Name ^ Priority Value ^ | ^ Name ^ Priority Value ^ | ||
| '' | | '' | ||
| '' | | '' | ||
- | | conntrack((As you can guess, this is NOT one of the placeholder names you can use. I added it here as a reminder which // | + | | conntrack((As you can guess, this is NOT one of the placeholder names you can use. I added it here as a reminder which // |
| '' | | '' | ||
| '' | | '' | ||
Line 219: | Line 231: | ||
| '' | | '' | ||
- | The following example creates a //table// named '' | + | When creating a //base chain//, you can e.g. specify '' |
<code bash> | <code bash> | ||
nft create table ip myfilter | nft create table ip myfilter | ||
+ | nft create chain ip myfilter foo {type filter hook input priority 0\;} | ||
+ | nft create chain ip myfilter bar {type filter hook input priority 50\;} | ||
- | nft create chain ip myfilter foo { type filter hook input priority 0\; } | + | # alternatively you could create the same chains using named priority values: |
- | nft create chain ip myfilter bar { type filter hook input priority 50\; } | + | nft create chain ip myfilter foo {type filter hook input priority filter\;} |
- | + | nft create chain ip myfilter bar {type filter hook input priority security\;} | |
- | # | + | |
- | nft create chain ip myfilter foo \ | + | |
- | { type filter hook input priority filter\; } | + | |
- | nft create chain ip myfilter bar \ | + | |
- | { type filter hook input priority security\; } | + | |
</ | </ | ||
- | As a result, IPv4 network packets traversing the // | + | <figure nftex3> |
- | will first traverse the '' | + | {{: |
+ | < | ||
+ | </ | ||
- | {{: | + | // |
- | === Negative Values === | ||
- | // | ||
<code bash> | <code bash> | ||
- | #adding ' | + | nft -- add chain foo bar {type nat hook input priority -100\;} |
- | nft -- add chain foo bar { type nat hook input priority -100\; } | + | |
</ | </ | ||
- | === What if priority is equal? === | + | But what actually happens when you register two //base chains// with the same hook which both have the same // |
- | What actually happens when you register two //base chains// with the same hook | + | |
- | which both have the same // | + | <code bash> |
- | chains// like this:<code bash> | + | nft create chain ip table1 chain1 {type filter hook input priority 0\;} |
- | nft create chain ip table1 chain1 { type filter hook input priority 0\; } | + | nft create chain ip table1 chain2 {type filter hook input priority 0\;} |
- | nft create chain ip table1 chain2 { type filter hook input priority 0\; } | + | |
</ | </ | ||
- | The source code of // | + | |
- | to register callbacks with the same hook which have the same // | + | I checked the kernel source code((see function '' |
- | In case of the example above, function '' | + | |
- | first called for //chain1// and then for // | + | |
- | source code((see function '' | + | |
'' | '' | ||
// | // | ||
- | (in front of) //chain1// in the array of callbacks | + | (in front of) //chain1// in the array of hook functions |
network packets then traverse //chain2// BEFORE //chain1//. This means here | network packets then traverse //chain2// BEFORE //chain1//. This means here | ||
- | the sequence/ | + | the sequence/ |
However, I guess it is best practice to consider the sequence in which two | However, I guess it is best practice to consider the sequence in which two | ||
chains with equal // | chains with equal // | ||
- | " | + | " |
to those //chains// in a way in which they do not depend on the the sequence of | to those //chains// in a way in which they do not depend on the the sequence of | ||
//chain// traversal. After all, the behavior I describe here is an internal | //chain// traversal. After all, the behavior I describe here is an internal | ||
kernel behavior which is undocumented and implementation could change with any | kernel behavior which is undocumented and implementation could change with any | ||
- | newer kernel version. Thus you should not rely on it! | + | newer kernel version. Thus, you should not rely on it! |
Line 277: | Line 281: | ||
==== Example: NAT edge router === | ==== Example: NAT edge router === | ||
- | {{: | + | The example in Figure |
- | If you e.g. like to do some simple IPv4 packet filtering and //snat// (masquerading) | + | |
- | ^ table ^ base chains ^ | + | <figure nftedgerouter> |
- | | '' | + | {{ : |
- | | '' | + | |
- | + | ||
- | You create these //tables// and //chains// in //address family// '' | + | |
<code bash> | <code bash> | ||
nft create table ip nat | nft create table ip nat | ||
- | nft create chain ip nat postrouting | + | nft create chain ip nat postrouting {type nat hook postrouting priority srcnat\;} |
- | { type nat hook postrouting priority srcnat\; } | + | nft add rule ip nat postrouting oif eth1 masquerade |
nft create table ip filter | nft create table ip filter | ||
- | nft create chain ip filter input \ | + | nft create chain ip filter input {type filter hook input priority filter\;} |
- | { type filter hook input priority filter\; } | + | nft create chain ip filter forward {type filter hook forward priority filter\;} |
- | nft create chain ip filter forward | + | nft create chain ip filter output {type filter hook output priority filter\;} |
- | { type filter hook forward priority filter\; } | + | nft add rule ip filter forward iif eth1 oif eth0 ct state new,invalid drop |
- | nft create chain ip filter output | + | nft add rule ip filter input iif eth1 ip protocol != icmp ct state new,invalid drop |
- | { type filter hook output priority filter\; } | + | |
</ | </ | ||
+ | {{ : | ||
+ | < | ||
+ | //base chains// getting registered with the // | ||
+ | </ | ||
- | As a result, the //chains// registered with the IPv4 // | ||
- | |||
- | {{ : | ||
- | |||
- | Then you add some simple masquerading and packet filtering rules: | ||
- | <code bash> | ||
- | nft add rule ip nat postrouting oif eth1 masquerade | ||
- | |||
- | nft add rule ip filter forward iif eth1 \ | ||
- | ct state new, | ||
- | nft add rule ip filter input iif eth1 ip protocol != icmp \ | ||
- | ct state new, | ||
- | </ | ||
- | |||
- | (I merely gave a minimalist example here. One could even remove the //output// //chain// again, because I did not add any rules to it. In reality you for sure will add a more complex set of rules.) | ||
+ | ==== List hook functions (coming soon) ==== | ||
+ | Nftables developers in July 2021 announced a new feature, which will | ||
+ | likely be included in the next version of Nftables to be released; | ||
+ | see [[http:// | ||
+ | registered with a specified Netfilter hook together with their assigned | ||
+ | priorities. If you e.g. like to list all hook functions currently registered with the Netfilter | ||
+ | IPv4 Prerouting hook, the syntax to do that will probably be something like | ||
+ | '' | ||
===== Context ===== | ===== Context ===== | ||
The described behavior and implementation has been observed on a | The described behavior and implementation has been observed on a | ||
- | Debian 10 (buster) system with using Debian // | + | Debian 10 (buster) system with using Debian // |
* kernel: '' | * kernel: '' | ||
* nftables: '' | * nftables: '' | ||
* libnftnl: '' | * libnftnl: '' | ||
+ | |||
+ | |||
===== Feedback ===== | ===== Feedback ===== | ||
[[: | [[: | ||
- | {{tag> | + | |
+ | //published 2020-05-17//, | ||
blog/linux/nftables_packet_flow_netfilter_hooks_detail.1591312186.txt.gz · Last modified: 2020-06-05 by Andrej Stender