Thermalcircle.de

climbing the thermals

User Tools

Site Tools


blog:linux:nftables_packet_flow_netfilter_hooks_detail

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
blog:linux:nftables_packet_flow_netfilter_hooks_detail [2020-11-03] – [Example: NAT edge router] Andrej Stenderblog:linux:nftables_packet_flow_netfilter_hooks_detail [2022-08-07] (current) – activated TOC Andrej Stender
Line 1: Line 1:
 +{{tag>linux kernel netfilter nftables iptables}}
 ====== Nftables - Packet flow and Netfilter hooks in detail ====== ====== Nftables - Packet flow and Netfilter hooks in detail ======
 ~~META: ~~META:
 date created = 2020-05-17  date created = 2020-05-17 
 ~~ ~~
- 
-~~NOTOC~~ 
  
 If you are using //Iptables// or the newer //Nftables// and you are merely doing some simple If you are using //Iptables// or the newer //Nftables// and you are merely doing some simple
Line 10: Line 9:
 official documentation and by a quick look through websites which official documentation and by a quick look through websites which
 provide example configurations.  provide example configurations. 
- 
 However, if you are working on a little bit more complex stuff like writing However, if you are working on a little bit more complex stuff like writing
-//Nftables// rules while caring for both IPv4 and IPv6, while using IPsec((Check out my other article [[:blog:linux:nftables_ipsec_packet_flow|Nftables - Netfilter and VPN/IPsec packet flow]], where I cover that topic.))+//Nftables// rules while caring for both IPv4 and IPv6, while using IPsec
 and doing NAT, or other of the "more interesting" stuff... then things tend and doing NAT, or other of the "more interesting" stuff... then things tend
 to get a little more tricky. to get a little more tricky.
Line 21: Line 19:
 in a little more detail. in a little more detail.
  
 +===== Rationale =====
 I for myself always like to know how things work and to dig a little deeper than I for myself always like to know how things work and to dig a little deeper than
 just gaining the very minimum knowledge required to solve the issue at hand. just gaining the very minimum knowledge required to solve the issue at hand.
Line 27: Line 26:
 the available documentation is outdated. Many of the more interesting details the available documentation is outdated. Many of the more interesting details
 are often only covered by older articles focused on the //Nftables// predecessor //Iptables//. are often only covered by older articles focused on the //Nftables// predecessor //Iptables//.
- 
 After digging through a lot of websites, some kernel source code and doing some practical After digging through a lot of websites, some kernel source code and doing some practical
 experimenting involving the //trace// and //log// features of //Nftables//, experimenting involving the //trace// and //log// features of //Nftables//,
Line 45: Line 43:
 <figure nfpackflowofficial> <figure nfpackflowofficial>
 {{:linux:netfilter-packet-flow.png?direct&700|}} {{:linux:netfilter-packet-flow.png?direct&700|}}
-<caption>Netfilter Packet Flow image\\  +<caption>Netfilter Packet Flow imagepublished on [[https://commons.wikimedia.org/wiki/File:Netfilter-packet-flow.svg|Wikipedia]][[https://creativecommons.org/licenses/by-sa/3.0/deed.en|CC BY-SA 3.0]]((Author: Jan Engelhardt\\ This kindly allows me to use this image as I publish my content under a compatible license. Thank you. See my licensing statement on page bottom.))
-The original author is Jan Engelhardt and it has been published on [[https://commons.wikimedia.org/wiki/File:Netfilter-packet-flow.svg|Wikipedia]] under the [[https://en.wikipedia.org/wiki/en:Creative_Commons|Creative Commons]] [[https://creativecommons.org/licenses/by-sa/3.0/deed.en|Attribution-Share Alike 3.0 Unported]] license((This kindly allows me to use it as I publish my content under a compatible license. Thank you. See my licensing statement on page bottom.)).+
 </caption> </caption>
 </figure> </figure>
  
-However, what this image shows you is the packet flow though the //Netfilter hooks// and thereby the packet flow through the //tables// and //chains// like they existed in old //Iptables//. In //Nftables// however you are free to create and name //tables// and //chains// to your liking, so things will probably look a little different then. The image still remains very useful, especially because it contains a lot of further details like //bridging//, //ingress// hook and //IPsec//%%/%%//xfrm//, however when interpreting it you are required to "read a little between the lines".+However, what this image shows you is the packet flow though the //Netfilter hooks// and thereby the packet flow through the //tables// and //chains// like they existed in old //Iptables//. In //Nftables// however you are free to create and name //tables// and //chains// to your liking, so things will probably look a little different then. The image still remains very useful, especially because it contains a lot of further details like //bridging//, //ingress// hook and //IPsec//%%/%%//Xfrm//((Check out my other article [[:blog:linux:nftables_ipsec_packet_flow|Nftables - Netfilter and VPN/IPsec packet flow]], where I cover that topic.)), however when interpreting it you are required to "read a little bit between the lines".
  
 ===== Netfilter ===== ===== Netfilter =====
Line 63: Line 60:
 A network packet received on a network device first traverses the //Prerouting// hook. Then the routing decision happens and thereby the kernel determines whether this packet is destined at a local process (e.g. socket of a server listening on the system) or whether the packet shall be forwarded (in that case the system works as a router). In the first case the packet then traverses the //Input// hook and is then given to the local process. In the second case the packet traverses the //Forward// hook and finally the //Postrouting// hook, before being sent out on a network device. A packet which has been generated by a local process (e.g. a client or server software which likes to send something out on the network), first traverses the //Output// hook and then also the //Postrouting// hook, before it is sent out on a network device. A network packet received on a network device first traverses the //Prerouting// hook. Then the routing decision happens and thereby the kernel determines whether this packet is destined at a local process (e.g. socket of a server listening on the system) or whether the packet shall be forwarded (in that case the system works as a router). In the first case the packet then traverses the //Input// hook and is then given to the local process. In the second case the packet traverses the //Forward// hook and finally the //Postrouting// hook, before being sent out on a network device. A packet which has been generated by a local process (e.g. a client or server software which likes to send something out on the network), first traverses the //Output// hook and then also the //Postrouting// hook, before it is sent out on a network device.
  
-Those five hooks have been present in the Linux kernel for a very long time. You can e.g. already find an equivalent of Figure {{ref>nfhookssimple}} in the [[https://netfilter.org/documentation/HOWTO//netfilter-hacking-HOWTO-3.html|Linux netfilter Hacking HOWTO]] from 2002. The good news is that at least from a bird's eye view all this is still accurate today. Of course, if you look into details, things are more complex now. I try to show that in Figure {{ref>nfhooksdetail}} (click to enlarge). The //courier// font within the image indicates how things are named within the Linux kernel source code.+Those five hooks have been present in the Linux kernel for a very long time. You can e.g. already find an equivalent of Figure {{ref>nfhookssimple}} in the [[https://netfilter.org/documentation/HOWTO//netfilter-hacking-HOWTO-3.html|Linux netfilter Hacking HOWTO]] from 2002. The good news is that at least from a bird's eye view all this is still accurate today. Of course, if you look into details, things are more complex now. I try to show that in Figure {{ref>nfhooksdetail}}. The //courier// font within the image indicates how things are named within the Linux kernel source code.
  
 <figure nfhooksdetail> <figure nfhooksdetail>
 {{:linux:nf-hooks-detail1.jpg?direct&700|}} {{:linux:nf-hooks-detail1.jpg?direct&700|}}
-<caption>Netfilter hooks in more detail: IPv4, IPv6, ARP, Bridging, network namespaces and Ingress</caption>+<caption>Netfilter hooks in more detail: IPv4, IPv6, ARP, Bridging, network namespaces and Ingress\\ (click to enlarge)</caption>
 </figure> </figure>
  
Line 77: Line 74:
 explicitly make use of network namespaces (e.g. by creating additional ones), explicitly make use of network namespaces (e.g. by creating additional ones),
 still one instance, the default network namespace //"init_net"//, always exists still one instance, the default network namespace //"init_net"//, always exists
-and then all the networking happens inside it.+and then all the networking happens inside this namespace.
  
 All the mentioned hooks exist independently (=are being re-created) within each All the mentioned hooks exist independently (=are being re-created) within each
 network namespace((The only exception here is the //ingress// hook which is bound to network namespace((The only exception here is the //ingress// hook which is bound to
-an individual //network device// and thereby (at least not directly) to a //network namespace//.)). That means the data structures in the Linux kernel which hold the list of callback+an individual //network device// and thereby (at least not directly) to a //network namespace//.)), as shown in Figure {{ref>nfhooksdetail}}. That means the data structures in the Linux kernel which hold the list of callback
 functions which are registered with the hooks, are re-created (initially empty) functions which are registered with the hooks, are re-created (initially empty)
-for each new network namespace. Thus who is registered with those hooks is+for each new network namespace. Thuswho is registered with those hooks is
 different and individual to each network namespace. different and individual to each network namespace.
 Of course the actual concept of network namespaces and its impact goes Of course the actual concept of network namespaces and its impact goes
Line 89: Line 86:
  
  
-==== Register callbacks ==== +==== Register hook functions ==== 
-As already mentioned, the idea of the hooks is to give other kernel components the opportunity to register //callback// functions with a hook which are then being called for each network packet which traverses this hook. //Netfilter// provides an API to do that and both //Iptables// and //Nftables// and further systems like //Connection Tracking// make use of it. This API provides these two functions  to register/unregister a callback function with a specific hook: ''nf_register_net_hook()'' and ''nf_unregister_net_hook()''. Figure {{ref>nfhookregister}} visualizes this.+As already mentioned, the idea of the hooks is to give other kernel components the opportunity to register //callback// functions with a Netfilter hook which are then being called for each network packet which traverses this hook. //Netfilter// provides an API to do that and both //Iptables// and //Nftables// and further systems like //Connection Tracking// make use of it. This API provides the functions ''[[https://elixir.bootlin.com/linux/v5.4.19/source/net/netfilter/core.c#L449|nf_register_net_hook()]]'' and ''[[https://elixir.bootlin.com/linux/v5.4.19/source/net/netfilter/core.c#L425|nf_unregister_net_hook()]]''((and further variations of those functions)) to register/unregister a callback function with a specific hook. Figure {{ref>nfhookregister}} visualizes this.
  
 <figure nfhookregister> <figure nfhookregister>
 {{ :linux:nf-hook-entries-register1.png?direct&600 }} {{ :linux:nf-hook-entries-register1.png?direct&600 }}
-<caption>Netfilter API to register/unregister callbacks with a hook</caption>+<caption>Netfilter API to register/unregister callbacks ("hook functions"with a hook</caption>
 </figure> </figure>
  
-Several callback functions can be registered with the same hook. //Netfilter// holds the function pointers of those callback functions (together with some meta data) in an array, which is dynamically being grown or shrunk each time when some component registers/unregisters a callback. Each hook has its own array, +Several callback functions can be registered with the same hook. //Netfilter// holds the function pointers of those functions (together with some meta data) in an array, which is dynamically being grown or shrunk each time when some component registers/unregisters a function. Each Netfilter hook has its own array, implemented as an instance of ''struct nf_hook_entries'' in the kernel
-implemented as an instance of ''struct nf_hook_entries'' in the kernel.+In most other documentation on the Internet as well as in discussions among the Netfilter developer community, those registered callback functions are usually referred to as "hook functions"((Sometimes they are simply referred to as "hooks", which creates some ambiguity. Be careful when you read something about a "hook" somewhere in the Internet... the meaning might be a "Netfilter hook", but it might also be a "callback function" registered with one of the Netfilter hooks.)). Thus, I will also refer to them as "hook functions" from now on
  
 ==== Priority ==== ==== Priority ====
-The sequence of callbacks in this array is important, because network packets which traverse the hook, will traverse the callbacks in the sequence in which those are present within the array. When registering a callback, the caller needs to specify a //priority// value (shown in red color in Figure {{ref>nfhookregister}}), which is then used by //Netfilter// to determine WHERE to insert the new callback into the array. The //priority// is a signed integer value (''int'') and the whole value range of that data type can be used. As you see in Figure {{ref>nfhookregister}}, //Netfilter// sorts the callbacks in ascending order from lower to higher //priority// values, thus callback with lower value like ''-200'' comes BEFORE a callback with a higher value like ''100''. However in practice not the full range of values of the //priority// integer seems to be used. The kernel contains several //enums// which define some common discrete //priority// values. Things seem a little messy here, because those enums are (a little) different for each protocol (= for each //Address Family// how //Nftables// would call it). Figure {{ref>nfipv4hookpriorities}} shows as an example the enum for the IPv4 protocol.+The sequence of hook functions in this array is important, because network packets which traverse the hook, will traverse the hook functions in the sequence in which those are present within the array. When registering a hook function, the caller needs to specify a //priority// value (shown in red color in Figure {{ref>nfhookregister}}), which is then used by //Netfilter// to determine WHERE to insert the new hook function into the array. The //priority// is a signed integer value (''int'') and the whole value range of that data type can be used. As you see in Figure {{ref>nfhookregister}}, //Netfilter// sorts the hook functions in ascending order from lower to higher //priority// values. Thus, a hook function with lower value like ''-200'' comes BEFORE a hook function with a higher value like ''100''. However in practice not the full range of values of the //priority// integer seems to be used. The kernel contains several //enums// which define some common discrete //priority// values. Things seem a little messy here, because those enums are (a little) different for each protocol (= for each //Address Family// how //Nftables// would call it). Figure {{ref>nfipv4hookpriorities}} shows as an example the enum for the IPv4 protocol.
  
 <figure nfipv4hookpriorities> <figure nfipv4hookpriorities>
Line 124: Line 121:
 </code> </code>
 <caption>IPv4 hook priorities //enum//\\  <caption>IPv4 hook priorities //enum//\\ 
-Source code extract from ''include/uapi/linux/netfilter_ipv4.h'', kernel v5.4.0</caption>+Source code extract from ''[[https://elixir.bootlin.com/linux/v5.4.19/source/include/uapi/linux/netfilter_ipv4.h#L30|include/uapi/linux/netfilter_ipv4.h]]''</caption>
 </figure> </figure>
  
-I go into such detail here, because this enum shows you the discrete //priority// values which are being used by kernel components like //connection tracking// when registering their own callbacks with a //Netfilter// hook. This is relevant for //Iptables// and //Nftabless// as you will see below.+I go into such detail here, because this enum shows you the discrete //priority// values which are being used by kernel components like //connection tracking// when registering their own hook functions with a Netfilter hook. This is relevant for //Iptables// and //Nftabless// as you will see below.
  
 ==== Hard-coded vs. Flexibility ==== ==== Hard-coded vs. Flexibility ====
-The //Netfilter// hooks themselves are hard-coded into the Linux kernel network stack. You'll find them in the source code if you search for function calls named ''NF_HOOK()''((or similar... a few variations exist)). In case you are wondering, why other kernel components are required to register callbacks with these hooks at +The Netfilter hooks themselves are hard-coded into the Linux kernel network stack. You'll find them in the source code if you search for function calls named ''NF_HOOK()''((or similar... a few variations exist)). In case you are wondering, why other kernel components are required to register hook functions with these Netfilter hooks at runtime and why those hook functions are not also hard coded... well I did not write this code, so your guess is as good as mine. There are many potential reasons which might have led to these design decisions, but common sense (and comments on some websites) made at least these two reasons obvious to me:
-runtime and why those callbacks are not also hard coded... well I did not write this code, so your guess is as good as mine. There are many potential reasons which might have led to these design decisions, but common sense (and comments on some websites) made at least these two reasons obvious to me:+
  
   - For once this kind of flexibility during runtime is an essential basic requirement in a kernel where many components (also //Nftables//, //Iptables// and //connection tracking//) can potentially be loaded or unloaded during runtime as //kernel modules// and which employs powerful concepts of further abstraction like //network namespaces//.   - For once this kind of flexibility during runtime is an essential basic requirement in a kernel where many components (also //Nftables//, //Iptables// and //connection tracking//) can potentially be loaded or unloaded during runtime as //kernel modules// and which employs powerful concepts of further abstraction like //network namespaces//.
-  - Performance is a crucial issue. Every network packet needs to traverse all callbacks registered with a hook. Thus those callbacks should be registered in an economical way. This is probably one of the driving reasons why //base chains// in //Nftables// need to be explicitly created by the user in contrast to the pre-defined chains of //Iptables// (more details below). +  - Performance is a crucial issue. Every network packet needs to traverse all hook functions registered with a Netfilter hook. Thusthose hook functions should be registered in an economical way. This is probably one of the driving reasons why //base chains// in //Nftables// need to be explicitly created by the user in contrast to the pre-defined chains of //Iptables// (more details below).
  
 ==== Hook traversal and verdict ==== ==== Hook traversal and verdict ====
-Now let's take a more detailed look on how the callbacks which are registered with the same hook are being traversed by network packets. Figure {{ref>nfhookentriesflow}} shows this (click to enlarge).+Now let's take a more detailed look on how the hook functions which are registered with the same Netfilter hook are being traversed by network packets.  
 +For each network packet which traverses this hook, the hook functions are being called one by one 
 +in the sequence/order in which they are present within the array of the hook (the sequence defined by 
 +the //priority// value).
  
 <figure nfhookentriesflow> <figure nfhookentriesflow>
 {{ :linux:nf-hook-entries-flow1.png?direct&700 }} {{ :linux:nf-hook-entries-flow1.png?direct&700 }}
-<caption>Packet flow through callbacks registered with a hook</caption>+<caption>Packet flow through hook functions registered with a Netfilter hook (click to enlarge)</caption>
 </figure> </figure>
  
-For each network packet which traverses this hook, the callback functions are being called one by one +Network packets are represented within the Linux kernel as instances of ''struct sk_buff'' (often referred to as "socket buffer" and abbreviated as //"skb"//). A pointer to such an //skb// instance is given as function argument to all these hook functions , so each one can examine the packet. Each hook function is required to give a "verdict" back to //Netfilter// as //return-value//. There are several possible values for the "verdict", but for understanding these concepts only these two are relevant: ''NF_ACCEPT'' and ''NF_DROP''. ''NF_ACCEPT'' tells //Netfilter//that the hook function "accepts" the network packet. This means the packet now traverses the next hook function registered with this hook (if existing). If all hook functions of this hook return ''NF_ACCEPT'', then the packet finally continues its traversal of the kernel network stack. Howeverif a hook function returns ''NF_DROP''then the packet is being "dropped" (=deleted) and no further hook functions or parts of the network stack are being traversed.
-in the sequence/order in which they are present within the array of the hook (the sequence defined by +
-the //priority// value). Network packets are represented within the Linux kernel as instances +
-of ''struct sk_buff'' (often abbreviated as //"skb"//). A pointer to such an //skb// instance is given as function argument to all these callback functions, so each one can examine the packet. Each callback is required to give a "verdict" back to //Netfilter// as //return-value//. There are several possible values for the "verdict", but for understanding these concepts only these two are relevant: ''NF_ACCEPT'' and ''NF_DROP''. ''NF_ACCEPT'' tells //Netfilter// that the overall "verdict" of the callback is that it "accepts" the network packet. This means the packet now traverses the next callback registered with this hook (if existing). If all callbacks of this hook return ''NF_ACCEPT'', then the packet finally continues its traversal of the kernel network stack. However if a callback returns ''NF_DROP'' then the packet is being "dropped" (=deleted) and no further callbacks or parts of the network stack are being traversed. +
  
 ===== Iptables ===== ===== Iptables =====
-To put things into context, let's take a short look at //Iptables// as the predecessor of //Nftables//. //Iptables// organizes its //rules// into //tables// and //chains//, whereas //tables// merely are a means (a container) to group //chains// together, which have something in common (e.g. //chains// which are used for //nat// belong to the ''nat'' //table//). The actual //rules// reside inside the //chains//. +To put things into context, let's take a short look at //Iptables// as the predecessor of //Nftables//. //Iptables// organizes its //rules// into //tables// and //chains//, whereas //tables// for the most part merely are a means (a container) to group //chains// together, which have something in common. E.g. //chains// which are used for //nat// belong to the ''nat''((Well, ''nat'' is already a special case and there is more magic behind it. E.g. only the very first packet of each connection will traverse the //chains// of the ''nat'' table, but that topic is beyond this article.)) //table//. The actual //rules// reside inside the //chains//. //Iptables// registers its //chains// with the Netfilter hooks by registering its own hook functions as described above. This means when a network packet traverses a hook (e.g. //Prerouting//), then this packet traverses the //chains// which are registered with this hook and thereby traverses their //rules//.
-//Iptables// registers its //chains// with the //Netfilter// hooks by registering its own callback functions as described above. This means when a network packet traverses a hook (e.g. //Prerouting//), then this packet traverses the //chains// which are registered with this hook and thereby traverses their //rules//.+
  
-In case of //Iptables// all that is already pre-defined. A fixed set of //tables// exists, each //table// containing a fixed set of //chains//((Ok, as a user you can also create additional //chains// if you want, but those are not registered with //Netfilter// hooks and anyway that is a different topic.)). The //chains// are named like the hooks with which they are registered. +In case of //Iptables// all that is already pre-defined. A fixed set of //tables// exists, each //table// containing a fixed set of //chains//((Ok, as a user you can also create additional //chains// if you want, but those are not registered with Netfilter hooks and anyway that is a different topic.)). The //chains// are named like the Netfilter hooks with which they are registered. 
  
 ^ table ^ contains chains ^ command to show that ^ ^ table ^ contains chains ^ command to show that ^
Line 163: Line 156:
 | ''raw''    | ''PREROUTING'',  ''OUTPUT'' | ''iptables -t raw -L'' | | ''raw''    | ''PREROUTING'',  ''OUTPUT'' | ''iptables -t raw -L'' |
  
-The sequence in which the //chains// are being traversed when a packet traverses the hook (their //priority//) is also already fixed. The Netfilter packet flow image (Figure {{ref>nfpackflowofficial}}) shows this sequence in detail. In the image, each //chain// registered with a hook is represented by a box like the following in Figure {{ref>nfhookentrylegend}}, containing the name of the //chain// and the //table// it belongs to.+The sequence in which the //chains// are being traversed when a packet traverses the hook (their //priority//) is also already fixed. The Netfilter packet flow image (Figure {{ref>nfpackflowofficial}}) shows this sequence in detail. In the image, each //chain// registered with a hook is represented by a block like the following in Figure {{ref>nfhookentrylegend}}, containing the name of the //chain// and the //table// it belongs to.
  
 <figure nfhookentrylegend> <figure nfhookentrylegend>
Line 170: Line 163:
 </figure> </figure>
  
-I additionally show the //priority// here (in red color) because I like to further elaborate on it, however the //priority// value is not shown in the original Netfilter packet flow image. +I additionally show the //priority// here (in red color) because I like to further elaborate on it. However, the //priority// value is not shown in the original Netfilter packet flow image. 
-The ''iptables'' cmdline tool itself is only responsible for configuring //tables//, //chains// and //rules// for handling IPv4 packets, thus its corresponding kernel component only registers its //chains// with the five //Netfilter// hooks of the IPv4 protocol. To cover all the protocol families, the complete //Iptables// suite is split up into several distinct cmdline tools and corresponding kernel components:+The ''iptables'' cmdline tool itself is only responsible for configuring //tables//, //chains// and //rules// for handling IPv4 packets. Thus, its corresponding kernel component only registers its //chains// with the five //Netfilter// hooks of the IPv4 protocol. To cover all the protocol families, the complete //Iptables// suite is split up into several distinct cmdline tools and corresponding kernel components:
  
   * ''iptables'' for IPv4 / ''NFPROTO_IPV4''   * ''iptables'' for IPv4 / ''NFPROTO_IPV4''
Line 182: Line 175:
 <figure nfthooksiptables> <figure nfthooksiptables>
 {{ :linux:nf-hooks-iptables1.png?direct&700 |}} {{ :linux:nf-hooks-iptables1.png?direct&700 |}}
-<caption>//Iptables// chains registered in the IPv4 Netfilter hooks (+conntrack) (click to enlarge)</caption>+<caption>//Iptables// chains registered with IPv4 Netfilter hooks (+conntrack) (click to enlarge) (compare to {{ref>nfpackflowofficial}})</caption>
 </figure> </figure>
  
 ===== Connection tracking ===== ===== Connection tracking =====
-As you can see in Figure {{ref>nfthooksiptables}}, the //connection tracking// system also registers itself with the //Netfilter// hooks and based on the //priority// value (''-200'') you can clearly see which //Iptables// //chain// is called BEFORE and which AFTER the //connection tracking// callback.+As you can see in Figure {{ref>nfthooksiptables}}, the //connection tracking// system also registers itself with the Netfilter hooks and based on the //priority// value (''-200'') you can clearly see which //Iptables// //chain// is called BEFORE and which AFTER the //connection tracking// hook function. There is much more to tell about //connection tracking//. If you further look into details, then you'll see that the //connection tracking// system actually even registers more hook functions with the Netfilter hooks, than shown here. However, the two hook functions shown represent a sufficient model to understand the behavior of //connection tracking// when creating //Iptables// or //Nftables// rules. I elaborate on the topic //connection tracking// in detail in a separate series of blog articles, starting with [[connection_tracking_1_modules_and_hooks|Connection tracking - Part 1: Modules and Hooks]].
  
-There is much more to tell about //connection tracking//. If you further look into details, then you'll see that the //connection tracking// system actually even registers more callback functions with the //Netfilter// hooks, than shown here. However, the two callbacks shown here represent a sufficient model to understand the behavior of //connection tracking// when creating //Iptables// or //Nftables// rules.  
-A very good article exists on this topic, written by Pablo Neira Ayuso, the Linux kernel maintainer of the //Netfilter// subsystem: [[http://people.netfilter.org/pablo/docs/login.pdf|Netfilter's connection tracking system]]. 
 ===== Nftables ===== ===== Nftables =====
 In general //Nftables// organizes its //rules// into //tables// and //chains// in the same way //Iptables// does. //Tables// are again containers for //chains// and //chains// are carrying the //rules// In general //Nftables// organizes its //rules// into //tables// and //chains// in the same way //Iptables// does. //Tables// are again containers for //chains// and //chains// are carrying the //rules//
 However, in contrast to //Iptables//, no pre-defined //tables// or //chains// exist. All //tables// and //chains// have to be explicitly created by the user. The user can give arbitrary names to the //tables// and //chains// when creating them. However, in contrast to //Iptables//, no pre-defined //tables// or //chains// exist. All //tables// and //chains// have to be explicitly created by the user. The user can give arbitrary names to the //tables// and //chains// when creating them.
-//Nftables// distinguishes between so-called //base chains// and //regular chains//. A //base chain// is a //chain// which is being registered with a //Netfilter// hook (by means of callback functions as described above) and you must specify that hook when you create the //chain//+//Nftables// distinguishes between so-called //base chains// and //regular chains//. A //base chain// is a //chain// which is being registered with a Netfilter hook (by means of hook functions as described above) and you must specify that hook when you create the //chain//
 A //regular chain// is not registered with any hook (//regular chains// are not covered in this article)((The //regular chains// represent the same feature as I already mentioned for //Iptables//. The user can create an arbitrary number of //chains// which are not registered to any hook and use them similar as you would use //functions// in a programming language. But that is an entirely different topic.)).  A //regular chain// is not registered with any hook (//regular chains// are not covered in this article)((The //regular chains// represent the same feature as I already mentioned for //Iptables//. The user can create an arbitrary number of //chains// which are not registered to any hook and use them similar as you would use //functions// in a programming language. But that is an entirely different topic.)). 
-Thus the user is not forced to name the //base chains// like the hooks they will be registered with. This obviously offers more freedom and flexibility, but thereby also has more potential to create confusion. +Thusthe user is not forced to name the //base chains// like the Netfilter hooks they will be registered with. This obviously offers more freedom and flexibility, but thereby also has more potential to create confusion.
  
 ==== Address Families ==== ==== Address Families ====
Line 211: Line 201:
 As a result, all //base chains// which you create within a //table// will be registered with the specified //Netfilter// hook of that //Address Family// which you selected for the //table//. The ''ip'' //Address Family// (IPv4) is the default one. So, if you do not specify any //Address Family// when creating a //table//, then this //table// will belong to ''ip''. As a result, all //base chains// which you create within a //table// will be registered with the specified //Netfilter// hook of that //Address Family// which you selected for the //table//. The ''ip'' //Address Family// (IPv4) is the default one. So, if you do not specify any //Address Family// when creating a //table//, then this //table// will belong to ''ip''.
  
-<figure nftex1>+The following example creates a new table named ''foo'', belonging to address family ''ip'', then creates a new base chain named ''bar'' in table ''foo'', registering it with //Netfilter// hook ''input'' of the ''ip'' address family (=IPv4 protocol) and specifying priority ''0'' (I explicitly specify ''ip'' //Address Family// here just to emphasize what is happening; it can be omitted.) 
 <code bash> <code bash>
 nft create table ip foo nft create table ip foo
 nft create chain ip foo bar {type filter hook input priority 0\;} nft create chain ip foo bar {type filter hook input priority 0\;}
 </code> </code>
-<caption>Example, creating a new table named ''foo'', belonging to address family ''ip'' 
-Creating a new base chain named ''bar'' in table ''foo'', registering it with 
-//Netfilter// hook ''input'' of the ''ip'' address family (=IPv4 protocol) and specifying priority ''0''. 
-(I explicitly specify ''ip'' //Address Family// here just to emphasize what is happening; it can be omitted.) 
-</caption> 
-</figure> 
  
-=== The inet family === 
 The ''inet'' //Address Family// is special. When you create a //table// belonging to that family and then create a //base chain// within that //table//, then this //base chain// will get registered with two //Netfilter// hooks: The equivalent //hooks// of IPv4 and IPv6. This means both IPv4 and IPv6 packets will traverse the //rules// of this //chain//. The ''inet'' //Address Family// is special. When you create a //table// belonging to that family and then create a //base chain// within that //table//, then this //base chain// will get registered with two //Netfilter// hooks: The equivalent //hooks// of IPv4 and IPv6. This means both IPv4 and IPv6 packets will traverse the //rules// of this //chain//.
 +The following example creates a table ''foo'' and a base chain ''bar'' in address family ''inet''. Base chain ''bar'' will get registered with Netfilter ''input'' hook of IPv4 and also with Netfilter ''input'' hook of IPv6.
  
-<figure nftex2> 
 <code bash> <code bash>
 nft create table inet foo nft create table inet foo
 nft create chain inet foo bar {type filter hook input priority 0\;} nft create chain inet foo bar {type filter hook input priority 0\;}
 </code> </code>
-<caption>Example, creating table ''foo'' and base chain ''bar'' in address family ''inet''. Base chain ''bar'' will get registered with Netfilter ''input'' hook of IPv4 and also Netfilter ''input'' hook of IPv6.</caption> +
-</figure>+
 ==== Priority ==== ==== Priority ====
 In the examples above you already saw that //Nftables// requires you to specify a //priority// In the examples above you already saw that //Nftables// requires you to specify a //priority//
 value when creating a //base chain//. This is the very same //priority// as I described already value when creating a //base chain//. This is the very same //priority// as I described already
 in detail when covering //Netfilter// above. You can specify integer values, but the newer in detail when covering //Netfilter// above. You can specify integer values, but the newer
-versions of //Nftables// also define placeholder names (Figure {{ref>nftpriotable}}) for several discrete //priority// values analog to the mentioned //enums// in //Netfilter//.+versions of //Nftables// also define placeholder names for several discrete //priority// values analog to the mentioned //enums// in //Netfilter//. The following table lists those placeholder names((no guarantee for completeness. By the time of writing this still seems to be under heavy development. See man page ''man 8 nft'' for details)).
  
-<figure nftpriotable> 
 ^ Name         ^ Priority Value ^ ^ Name         ^ Priority Value ^
 | ''raw''      | ''-300'' | ''raw''      | ''-300''
 | ''mangle''   | ''-150'' | ''mangle''   | ''-150''
-| conntrack((As you can guess, this is NOT one of the placeholder names you can use. I added it here as a reminder which //priority// value is reserved for the //connection tracking// callback.))    | ''-200'' |+| conntrack((As you can guess, this is NOT one of the placeholder names you can use. I added it here as a reminder which //priority// value is reserved for the //connection tracking// hook function.))    | ''-200'' |
 | ''dstnat''   | ''-100'' | ''dstnat''   | ''-100''
 | ''filter''   | ''0''    |  | ''filter''   | ''0''    | 
 | ''security'' | ''50''    | ''security'' | ''50''   
 | ''srcnat''   | ''100''  | ''srcnat''   | ''100'' 
-<caption>Nftables placeholder names for priority values, no guarantee for completeness. By the time of writing this still seems to be under heavy development. See man page ''man 8 nft'' for details.</caption> 
-</figure> 
  
-When creating a //base chain//, you can e.g. specify ''priority filter'' which translates into ''priority 0''Let's have another example:+When creating a //base chain//, you can e.g. specify ''priority filter'' which translates into ''priority 0''The following example creates a //table// named ''myfilter'' in the ''ip'' //address family// (IPv4). It then creates two //base chains// named ''foo'' and ''bar'', registering them with the //Netfilter// IPv4 hook //input//, but each with different //priority//. Figure {{ref>nftex3}} shows the result. IPv4 network packets traversing the //Netfilter// hook //input// will first traverse the ''foo'' //chain// and then the ''bar'' //chain//.
  
-<figure nftex3> 
 <code bash> <code bash>
 nft create table ip myfilter nft create table ip myfilter
Line 263: Line 242:
 nft create chain ip myfilter bar {type filter hook input priority security\;} nft create chain ip myfilter bar {type filter hook input priority security\;}
 </code> </code>
 +
 +<figure nftex3>
 {{:linux:netfilter-input-hook-nft-example1.png?nolink&200|}} {{:linux:netfilter-input-hook-nft-example1.png?nolink&200|}}
-<caption>Example, creating a //table// named ''myfilter'' in the ''ip'' //address family// (IPv4) and then creating two //base chains// named ''foo'' and ''bar'', registering them with the //Netfilter// IPv4 hook //input//, but each with different //priority//. As a result, IPv4 network packets traversing the //Netfilter// hook //input// will first traverse the ''foo'' //chain// and then the ''bar'' //chain//+<caption>Base chains ''foo'' and ''bar'' registered with the //Netfilter// Ipv4 //input// hook</caption>
-</caption>+
 </figure> </figure>
  
-=== Negative Values === +//Nftables// currently has a limitation (see [[https://bugzilla.netfilter.org/show_bug.cgi?id=1083|bug ticket]]) which makes it difficult (or at least uncomfortable) to enter negative integer values for the //priority// on the ''nft'' command line. Using the placeholder names is probably the most comfortable workaround. Adding ''%%--%%'' after ''nft'' the another way to do it:
-//Nftables// currently has a limitation (see [[https://bugzilla.netfilter.org/show_bug.cgi?id=1083|bug ticket]]) which makes it difficult (or at least uncomfortable) to enter negative integer values for the //priority// on the ''nft'' command line. Using the placeholder names is probably the most comfortable workaround. Figure {{ref>nftnegval}} shows another way to do this.+
  
-<figure nftnegval> 
 <code bash> <code bash>
 nft -- add chain foo bar {type nat hook input priority -100\;} nft -- add chain foo bar {type nat hook input priority -100\;}
 </code> </code>
-<caption>Adding ''%%--%%'' makes it possible to specify negative //priority//.</caption> 
-</figure> 
  
-=== What if priority is equal? === +But what actually happens when you register two //base chains// with the same hook which both have the same //priority//? The source code of //Netfilter// answers this question. It actually allows to register hook functions with the same hook which have the same //priority// value. In case of the following example, function ''nf_register_net_hook()'' is first called for //chain1// and then for //chain2//.
-What actually happens when you register two //base chains// with the same hook +
-which both have the same //priority//? The source code of //Netfilter// answers this question. It actually allows to register callbacks with the same hook which have the same //priority// value. +
-In case of the example in Figure {{ref>nftequalprio}}, function ''nf_register_net_hook()'' is +
-first called for //chain1// and then for //chain2//.+
  
-<figure nftequalprio> 
 <code bash> <code bash>
 nft create chain ip table1 chain1 {type filter hook input priority 0\;} nft create chain ip table1 chain1 {type filter hook input priority 0\;}
 nft create chain ip table1 chain2 {type filter hook input priority 0\;} nft create chain ip table1 chain2 {type filter hook input priority 0\;}
 </code> </code>
-<caption>Example, creating two base chains with the same //priority//.</caption> 
-</figure> 
  
 I checked the kernel source code((see function ''nf_hook_entries_grow()'' in I checked the kernel source code((see function ''nf_hook_entries_grow()'' in
 ''net/netfilter/core.c'' in kernel v5.4.0)) and was able to confirm the behavior with the ''net/netfilter/core.c'' in kernel v5.4.0)) and was able to confirm the behavior with the
 //Nftables// ''nftrace'' feature: The kernel code places //chain2// BEFORE //Nftables// ''nftrace'' feature: The kernel code places //chain2// BEFORE
-(in front of) //chain1// in the array of callbacks for this hook. As a result, +(in front of) //chain1// in the array of hook functions for this hook. As a result, 
 network packets then traverse //chain2// BEFORE //chain1//. This means here network packets then traverse //chain2// BEFORE //chain1//. This means here
-the sequence/order in which you register both chains becomes relevant! +the sequence/order in which you issue the commands to register both chains becomes relevant!
 However, I guess it is best practice to consider the sequence in which two However, I guess it is best practice to consider the sequence in which two
 chains with equal //priority// on the same hook are traversed to be chains with equal //priority// on the same hook are traversed to be
-"undefined" and thus to either avoid this case or to design the //rules// added+"undefined"and thus to either avoid this case or to design the //rules// added
 to those //chains// in a way in which they do not depend on the the sequence of to those //chains// in a way in which they do not depend on the the sequence of
 //chain// traversal. After all, the behavior I describe here is an internal //chain// traversal. After all, the behavior I describe here is an internal
 kernel behavior which is undocumented and implementation could change with any kernel behavior which is undocumented and implementation could change with any
-newer kernel version. Thus you should not rely on it!+newer kernel version. Thusyou should not rely on it!
  
  
Line 313: Line 281:
  
 ==== Example: NAT edge router === ==== Example: NAT edge router ===
-This example demonstrates an edge router, doing some simple IPv4 packet filtering and //SNAT// (masquerading).  +The example in Figure {{ref>nftedgerouter}} demonstrates an edge router, doing some simple IPv4 packet filtering and //SNAT// (masquerading). I merely gave a minimalist example here. One could even remove the //output// //chain// again, because I did not add any rules to it. In reality you for sure will add a more complex set of rules.
-I merely gave a minimalist example here. One could even remove the //output// //chain// again, because I did not add any rules to it. In reality you for sure will add a more complex set of rules.+
  
 <figure nftedgerouter> <figure nftedgerouter>
Line 336: Line 303:
  
  
 +==== List hook functions (coming soon) ====
 +Nftables developers in July 2021 announced a new feature, which will
 +likely be included in the next version of Nftables to be released;
 +see [[http://git.netfilter.org/nftables/commit/?id=4694f7230195bfcff179ed418ddcdd5ff7d5a8e1|this recent git commit]]. This feature lets Nftables list all the hook functions which are currently
 +registered with a specified Netfilter hook together with their assigned
 +priorities. If you e.g. like to list all hook functions currently registered with the Netfilter
 +IPv4 Prerouting hook, the syntax to do that will probably be something like
 +''nft list hook ip prerouting''.
 ===== Context ===== ===== Context =====
 The described behavior and implementation has been observed on a The described behavior and implementation has been observed on a
Line 348: Line 323:
 [[:feedback|Feedback]] to this article is very welcome! [[:feedback|Feedback]] to this article is very welcome!
  
-{{tag>linux netfilter nftables iptables}}+ 
 +//published 2020-05-17//, //last modified 2022-08-07// 
  
blog/linux/nftables_packet_flow_netfilter_hooks_detail.1604360947.txt.gz · Last modified: 2020-11-03 by Andrej Stender