blog:linux:nftables_packet_flow_netfilter_hooks_detail
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
blog:linux:nftables_packet_flow_netfilter_hooks_detail [2020-11-03] – [Example: NAT edge router] Andrej Stender | blog:linux:nftables_packet_flow_netfilter_hooks_detail [2022-08-07] (current) – activated TOC Andrej Stender | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | {{tag> | ||
====== Nftables - Packet flow and Netfilter hooks in detail ====== | ====== Nftables - Packet flow and Netfilter hooks in detail ====== | ||
~~META: | ~~META: | ||
date created = 2020-05-17 | date created = 2020-05-17 | ||
~~ | ~~ | ||
- | |||
- | ~~NOTOC~~ | ||
If you are using // | If you are using // | ||
Line 10: | Line 9: | ||
official documentation and by a quick look through websites which | official documentation and by a quick look through websites which | ||
provide example configurations. | provide example configurations. | ||
- | |||
However, if you are working on a little bit more complex stuff like writing | However, if you are working on a little bit more complex stuff like writing | ||
- | // | + | // |
and doing NAT, or other of the "more interesting" | and doing NAT, or other of the "more interesting" | ||
to get a little more tricky. | to get a little more tricky. | ||
Line 21: | Line 19: | ||
in a little more detail. | in a little more detail. | ||
+ | ===== Rationale ===== | ||
I for myself always like to know how things work and to dig a little deeper than | I for myself always like to know how things work and to dig a little deeper than | ||
just gaining the very minimum knowledge required to solve the issue at hand. | just gaining the very minimum knowledge required to solve the issue at hand. | ||
Line 27: | Line 26: | ||
the available documentation is outdated. Many of the more interesting details | the available documentation is outdated. Many of the more interesting details | ||
are often only covered by older articles focused on the // | are often only covered by older articles focused on the // | ||
- | |||
After digging through a lot of websites, some kernel source code and doing some practical | After digging through a lot of websites, some kernel source code and doing some practical | ||
experimenting involving the //trace// and //log// features of // | experimenting involving the //trace// and //log// features of // | ||
Line 45: | Line 43: | ||
<figure nfpackflowofficial> | <figure nfpackflowofficial> | ||
{{: | {{: | ||
- | < | + | < |
- | The original author is Jan Engelhardt and it has been published on [[https:// | + | |
</ | </ | ||
</ | </ | ||
- | However, what this image shows you is the packet flow though the //Netfilter hooks// and thereby the packet flow through the //tables// and //chains// like they existed in old // | + | However, what this image shows you is the packet flow though the //Netfilter hooks// and thereby the packet flow through the //tables// and //chains// like they existed in old // |
===== Netfilter ===== | ===== Netfilter ===== | ||
Line 63: | Line 60: | ||
A network packet received on a network device first traverses the // | A network packet received on a network device first traverses the // | ||
- | Those five hooks have been present in the Linux kernel for a very long time. You can e.g. already find an equivalent of Figure {{ref> | + | Those five hooks have been present in the Linux kernel for a very long time. You can e.g. already find an equivalent of Figure {{ref> |
<figure nfhooksdetail> | <figure nfhooksdetail> | ||
{{: | {{: | ||
- | < | + | < |
</ | </ | ||
Line 77: | Line 74: | ||
explicitly make use of network namespaces (e.g. by creating additional ones), | explicitly make use of network namespaces (e.g. by creating additional ones), | ||
still one instance, the default network namespace //" | still one instance, the default network namespace //" | ||
- | and then all the networking happens inside | + | and then all the networking happens inside |
All the mentioned hooks exist independently (=are being re-created) within each | All the mentioned hooks exist independently (=are being re-created) within each | ||
network namespace((The only exception here is the //ingress// hook which is bound to | network namespace((The only exception here is the //ingress// hook which is bound to | ||
- | an individual //network device// and thereby (at least not directly) to a //network namespace// | + | an individual //network device// and thereby (at least not directly) to a //network namespace// |
functions which are registered with the hooks, are re-created (initially empty) | functions which are registered with the hooks, are re-created (initially empty) | ||
- | for each new network namespace. Thus who is registered with those hooks is | + | for each new network namespace. Thus, who is registered with those hooks is |
different and individual to each network namespace. | different and individual to each network namespace. | ||
Of course the actual concept of network namespaces and its impact goes | Of course the actual concept of network namespaces and its impact goes | ||
Line 89: | Line 86: | ||
- | ==== Register | + | ==== Register |
- | As already mentioned, the idea of the hooks is to give other kernel components the opportunity to register // | + | As already mentioned, the idea of the hooks is to give other kernel components the opportunity to register // |
<figure nfhookregister> | <figure nfhookregister> | ||
{{ : | {{ : | ||
- | < | + | < |
</ | </ | ||
- | Several callback functions can be registered with the same hook. // | + | Several callback functions can be registered with the same hook. // |
- | implemented as an instance of '' | + | In most other documentation on the Internet as well as in discussions among the Netfilter developer community, those registered callback functions are usually referred to as "hook functions" |
==== Priority ==== | ==== Priority ==== | ||
- | The sequence of callbacks | + | The sequence of hook functions |
<figure nfipv4hookpriorities> | <figure nfipv4hookpriorities> | ||
Line 124: | Line 121: | ||
</ | </ | ||
< | < | ||
- | Source code extract from '' | + | Source code extract from '' |
</ | </ | ||
- | I go into such detail here, because this enum shows you the discrete // | + | I go into such detail here, because this enum shows you the discrete // |
==== Hard-coded vs. Flexibility ==== | ==== Hard-coded vs. Flexibility ==== | ||
- | The //Netfilter// hooks themselves are hard-coded into the Linux kernel network stack. You'll find them in the source code if you search for function calls named '' | + | The Netfilter hooks themselves are hard-coded into the Linux kernel network stack. You'll find them in the source code if you search for function calls named '' |
- | runtime and why those callbacks | + | |
- For once this kind of flexibility during runtime is an essential basic requirement in a kernel where many components (also // | - For once this kind of flexibility during runtime is an essential basic requirement in a kernel where many components (also // | ||
- | - Performance is a crucial issue. Every network packet needs to traverse all callbacks | + | - Performance is a crucial issue. Every network packet needs to traverse all hook functions |
==== Hook traversal and verdict ==== | ==== Hook traversal and verdict ==== | ||
- | Now let's take a more detailed look on how the callbacks | + | Now let's take a more detailed look on how the hook functions |
+ | For each network packet which traverses | ||
+ | in the sequence/ | ||
+ | the // | ||
<figure nfhookentriesflow> | <figure nfhookentriesflow> | ||
{{ : | {{ : | ||
- | < | + | < |
</ | </ | ||
- | For each network packet which traverses this hook, the callback functions are being called one by one | + | Network packets are represented within the Linux kernel as instances of '' |
- | in the sequence/ | + | |
- | the // | + | |
- | of '' | + | |
===== Iptables ===== | ===== Iptables ===== | ||
- | To put things into context, let's take a short look at // | + | To put things into context, let's take a short look at // |
- | // | + | |
- | In case of // | + | In case of // |
^ table ^ contains chains ^ command to show that ^ | ^ table ^ contains chains ^ command to show that ^ | ||
Line 163: | Line 156: | ||
| '' | | '' | ||
- | The sequence in which the //chains// are being traversed when a packet traverses the hook (their // | + | The sequence in which the //chains// are being traversed when a packet traverses the hook (their // |
<figure nfhookentrylegend> | <figure nfhookentrylegend> | ||
Line 170: | Line 163: | ||
</ | </ | ||
- | I additionally show the // | + | I additionally show the // |
- | The '' | + | The '' |
* '' | * '' | ||
Line 182: | Line 175: | ||
<figure nfthooksiptables> | <figure nfthooksiptables> | ||
{{ : | {{ : | ||
- | < | + | < |
</ | </ | ||
===== Connection tracking ===== | ===== Connection tracking ===== | ||
- | As you can see in Figure {{ref> | + | As you can see in Figure {{ref> |
- | There is much more to tell about // | ||
- | A very good article exists on this topic, written by Pablo Neira Ayuso, the Linux kernel maintainer of the // | ||
===== Nftables ===== | ===== Nftables ===== | ||
In general // | In general // | ||
However, in contrast to // | However, in contrast to // | ||
- | // | + | // |
A //regular chain// is not registered with any hook (//regular chains// are not covered in this article)((The //regular chains// represent the same feature as I already mentioned for // | A //regular chain// is not registered with any hook (//regular chains// are not covered in this article)((The //regular chains// represent the same feature as I already mentioned for // | ||
- | Thus the user is not forced to name the //base chains// like the hooks they will be registered with. This obviously offers more freedom and flexibility, | + | Thus, the user is not forced to name the //base chains// like the Netfilter |
==== Address Families ==== | ==== Address Families ==== | ||
Line 211: | Line 201: | ||
As a result, all //base chains// which you create within a //table// will be registered with the specified // | As a result, all //base chains// which you create within a //table// will be registered with the specified // | ||
- | <figure nftex1> | + | The following example creates a new table named '' |
<code bash> | <code bash> | ||
nft create table ip foo | nft create table ip foo | ||
nft create chain ip foo bar {type filter hook input priority 0\;} | nft create chain ip foo bar {type filter hook input priority 0\;} | ||
</ | </ | ||
- | < | ||
- | Creating a new base chain named '' | ||
- | // | ||
- | (I explicitly specify '' | ||
- | </ | ||
- | </ | ||
- | === The inet family === | ||
The '' | The '' | ||
+ | The following example creates a table '' | ||
- | <figure nftex2> | ||
<code bash> | <code bash> | ||
nft create table inet foo | nft create table inet foo | ||
nft create chain inet foo bar {type filter hook input priority 0\;} | nft create chain inet foo bar {type filter hook input priority 0\;} | ||
</ | </ | ||
- | < | + | |
- | </ | + | |
==== Priority ==== | ==== Priority ==== | ||
In the examples above you already saw that // | In the examples above you already saw that // | ||
value when creating a //base chain//. This is the very same // | value when creating a //base chain//. This is the very same // | ||
in detail when covering // | in detail when covering // | ||
- | versions of // | + | versions of // |
- | <figure nftpriotable> | ||
^ Name ^ Priority Value ^ | ^ Name ^ Priority Value ^ | ||
| '' | | '' | ||
| '' | | '' | ||
- | | conntrack((As you can guess, this is NOT one of the placeholder names you can use. I added it here as a reminder which // | + | | conntrack((As you can guess, this is NOT one of the placeholder names you can use. I added it here as a reminder which // |
| '' | | '' | ||
| '' | | '' | ||
| '' | | '' | ||
| '' | | '' | ||
- | < | ||
- | </ | ||
- | When creating a //base chain//, you can e.g. specify '' | + | When creating a //base chain//, you can e.g. specify '' |
- | <figure nftex3> | ||
<code bash> | <code bash> | ||
nft create table ip myfilter | nft create table ip myfilter | ||
Line 263: | Line 242: | ||
nft create chain ip myfilter bar {type filter hook input priority security\;} | nft create chain ip myfilter bar {type filter hook input priority security\;} | ||
</ | </ | ||
+ | |||
+ | <figure nftex3> | ||
{{: | {{: | ||
- | < | + | < |
- | </ | + | |
</ | </ | ||
- | === Negative Values === | + | // |
- | // | + | |
- | <figure nftnegval> | ||
<code bash> | <code bash> | ||
nft -- add chain foo bar {type nat hook input priority -100\;} | nft -- add chain foo bar {type nat hook input priority -100\;} | ||
</ | </ | ||
- | < | ||
- | </ | ||
- | === What if priority is equal? === | + | But what actually happens when you register two //base chains// with the same hook which both have the same // |
- | What actually happens when you register two //base chains// with the same hook | + | |
- | which both have the same // | + | |
- | In case of the example | + | |
- | first called for //chain1// and then for //chain2//. | + | |
- | <figure nftequalprio> | ||
<code bash> | <code bash> | ||
nft create chain ip table1 chain1 {type filter hook input priority 0\;} | nft create chain ip table1 chain1 {type filter hook input priority 0\;} | ||
nft create chain ip table1 chain2 {type filter hook input priority 0\;} | nft create chain ip table1 chain2 {type filter hook input priority 0\;} | ||
</ | </ | ||
- | < | ||
- | </ | ||
I checked the kernel source code((see function '' | I checked the kernel source code((see function '' | ||
'' | '' | ||
// | // | ||
- | (in front of) //chain1// in the array of callbacks | + | (in front of) //chain1// in the array of hook functions |
network packets then traverse //chain2// BEFORE //chain1//. This means here | network packets then traverse //chain2// BEFORE //chain1//. This means here | ||
- | the sequence/ | + | the sequence/ |
However, I guess it is best practice to consider the sequence in which two | However, I guess it is best practice to consider the sequence in which two | ||
chains with equal // | chains with equal // | ||
- | " | + | " |
to those //chains// in a way in which they do not depend on the the sequence of | to those //chains// in a way in which they do not depend on the the sequence of | ||
//chain// traversal. After all, the behavior I describe here is an internal | //chain// traversal. After all, the behavior I describe here is an internal | ||
kernel behavior which is undocumented and implementation could change with any | kernel behavior which is undocumented and implementation could change with any | ||
- | newer kernel version. Thus you should not rely on it! | + | newer kernel version. Thus, you should not rely on it! |
Line 313: | Line 281: | ||
==== Example: NAT edge router === | ==== Example: NAT edge router === | ||
- | This example demonstrates an edge router, doing some simple IPv4 packet filtering and //SNAT// (masquerading). | + | The example |
- | I merely gave a minimalist example here. One could even remove the //output// //chain// again, because I did not add any rules to it. In reality you for sure will add a more complex set of rules. | + | |
<figure nftedgerouter> | <figure nftedgerouter> | ||
Line 336: | Line 303: | ||
+ | ==== List hook functions (coming soon) ==== | ||
+ | Nftables developers in July 2021 announced a new feature, which will | ||
+ | likely be included in the next version of Nftables to be released; | ||
+ | see [[http:// | ||
+ | registered with a specified Netfilter hook together with their assigned | ||
+ | priorities. If you e.g. like to list all hook functions currently registered with the Netfilter | ||
+ | IPv4 Prerouting hook, the syntax to do that will probably be something like | ||
+ | '' | ||
===== Context ===== | ===== Context ===== | ||
The described behavior and implementation has been observed on a | The described behavior and implementation has been observed on a | ||
Line 348: | Line 323: | ||
[[: | [[: | ||
- | {{tag> | + | |
+ | //published 2020-05-17//, | ||
blog/linux/nftables_packet_flow_netfilter_hooks_detail.1604360947.txt.gz · Last modified: 2020-11-03 by Andrej Stender