Thermalcircle.de

climbing the thermals

User Tools

Site Tools


blog:linux:nftables_packet_flow_netfilter_hooks_detail

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
blog:linux:nftables_packet_flow_netfilter_hooks_detail [2021-04-05] – tiny cosmetics Andrej Stenderblog:linux:nftables_packet_flow_netfilter_hooks_detail [2022-08-07] (current) – activated TOC Andrej Stender
Line 1: Line 1:
-{{tag>linux netfilter nftables iptables}}+{{tag>linux kernel netfilter nftables iptables}}
 ====== Nftables - Packet flow and Netfilter hooks in detail ====== ====== Nftables - Packet flow and Netfilter hooks in detail ======
 ~~META: ~~META:
 date created = 2020-05-17  date created = 2020-05-17 
 ~~ ~~
- 
-~~NOTOC~~ 
  
 If you are using //Iptables// or the newer //Nftables// and you are merely doing some simple If you are using //Iptables// or the newer //Nftables// and you are merely doing some simple
Line 12: Line 10:
 provide example configurations.  provide example configurations. 
 However, if you are working on a little bit more complex stuff like writing However, if you are working on a little bit more complex stuff like writing
-//Nftables// rules while caring for both IPv4 and IPv6, while using IPsec((Check out my other article [[:blog:linux:nftables_ipsec_packet_flow|Nftables - Netfilter and VPN/IPsec packet flow]], where I cover that topic.))+//Nftables// rules while caring for both IPv4 and IPv6, while using IPsec
 and doing NAT, or other of the "more interesting" stuff... then things tend and doing NAT, or other of the "more interesting" stuff... then things tend
 to get a little more tricky. to get a little more tricky.
Line 28: Line 26:
 the available documentation is outdated. Many of the more interesting details the available documentation is outdated. Many of the more interesting details
 are often only covered by older articles focused on the //Nftables// predecessor //Iptables//. are often only covered by older articles focused on the //Nftables// predecessor //Iptables//.
- 
 After digging through a lot of websites, some kernel source code and doing some practical After digging through a lot of websites, some kernel source code and doing some practical
 experimenting involving the //trace// and //log// features of //Nftables//, experimenting involving the //trace// and //log// features of //Nftables//,
Line 50: Line 47:
 </figure> </figure>
  
-However, what this image shows you is the packet flow though the //Netfilter hooks// and thereby the packet flow through the //tables// and //chains// like they existed in old //Iptables//. In //Nftables// however you are free to create and name //tables// and //chains// to your liking, so things will probably look a little different then. The image still remains very useful, especially because it contains a lot of further details like //bridging//, //ingress// hook and //IPsec//%%/%%//Xfrm//, however when interpreting it you are required to "read a little bit between the lines".+However, what this image shows you is the packet flow though the //Netfilter hooks// and thereby the packet flow through the //tables// and //chains// like they existed in old //Iptables//. In //Nftables// however you are free to create and name //tables// and //chains// to your liking, so things will probably look a little different then. The image still remains very useful, especially because it contains a lot of further details like //bridging//, //ingress// hook and //IPsec//%%/%%//Xfrm//((Check out my other article [[:blog:linux:nftables_ipsec_packet_flow|Nftables - Netfilter and VPN/IPsec packet flow]], where I cover that topic.)), however when interpreting it you are required to "read a little bit between the lines".
  
 ===== Netfilter ===== ===== Netfilter =====
Line 89: Line 86:
  
  
-==== Register callbacks ==== +==== Register hook functions ==== 
-As already mentioned, the idea of the hooks is to give other kernel components the opportunity to register //callback// functions with a hook which are then being called for each network packet which traverses this hook. //Netfilter// provides an API to do that and both //Iptables// and //Nftables// and further systems like //Connection Tracking// make use of it. This API provides these two functions  to register/unregister a callback function with a specific hook: ''nf_register_net_hook()'' and ''nf_unregister_net_hook()''. Figure {{ref>nfhookregister}} visualizes this.+As already mentioned, the idea of the hooks is to give other kernel components the opportunity to register //callback// functions with a Netfilter hook which are then being called for each network packet which traverses this hook. //Netfilter// provides an API to do that and both //Iptables// and //Nftables// and further systems like //Connection Tracking// make use of it. This API provides the functions ''[[https://elixir.bootlin.com/linux/v5.4.19/source/net/netfilter/core.c#L449|nf_register_net_hook()]]'' and ''[[https://elixir.bootlin.com/linux/v5.4.19/source/net/netfilter/core.c#L425|nf_unregister_net_hook()]]''((and further variations of those functions)) to register/unregister a callback function with a specific hook. Figure {{ref>nfhookregister}} visualizes this.
  
 <figure nfhookregister> <figure nfhookregister>
 {{ :linux:nf-hook-entries-register1.png?direct&600 }} {{ :linux:nf-hook-entries-register1.png?direct&600 }}
-<caption>Netfilter API to register/unregister callbacks with a hook</caption>+<caption>Netfilter API to register/unregister callbacks ("hook functions"with a hook</caption>
 </figure> </figure>
  
-Several callback functions can be registered with the same hook. //Netfilter// holds the function pointers of those callback functions (together with some meta data) in an array, which is dynamically being grown or shrunk each time when some component registers/unregisters a callback. Each hook has its own array, +Several callback functions can be registered with the same hook. //Netfilter// holds the function pointers of those functions (together with some meta data) in an array, which is dynamically being grown or shrunk each time when some component registers/unregisters a function. Each Netfilter hook has its own array, implemented as an instance of ''struct nf_hook_entries'' in the kernel
-implemented as an instance of ''struct nf_hook_entries'' in the kernel.+In most other documentation on the Internet as well as in discussions among the Netfilter developer community, those registered callback functions are usually referred to as "hook functions"((Sometimes they are simply referred to as "hooks", which creates some ambiguity. Be careful when you read something about a "hook" somewhere in the Internet... the meaning might be a "Netfilter hook", but it might also be a "callback function" registered with one of the Netfilter hooks.)). Thus, I will also refer to them as "hook functions" from now on
  
 ==== Priority ==== ==== Priority ====
-The sequence of callbacks in this array is important, because network packets which traverse the hook, will traverse the callbacks in the sequence in which those are present within the array. When registering a callback, the caller needs to specify a //priority// value (shown in red color in Figure {{ref>nfhookregister}}), which is then used by //Netfilter// to determine WHERE to insert the new callback into the array. The //priority// is a signed integer value (''int'') and the whole value range of that data type can be used. As you see in Figure {{ref>nfhookregister}}, //Netfilter// sorts the callbacks in ascending order from lower to higher //priority// values. Thus, a callback with lower value like ''-200'' comes BEFORE a callback with a higher value like ''100''. However in practice not the full range of values of the //priority// integer seems to be used. The kernel contains several //enums// which define some common discrete //priority// values. Things seem a little messy here, because those enums are (a little) different for each protocol (= for each //Address Family// how //Nftables// would call it). Figure {{ref>nfipv4hookpriorities}} shows as an example the enum for the IPv4 protocol.+The sequence of hook functions in this array is important, because network packets which traverse the hook, will traverse the hook functions in the sequence in which those are present within the array. When registering a hook function, the caller needs to specify a //priority// value (shown in red color in Figure {{ref>nfhookregister}}), which is then used by //Netfilter// to determine WHERE to insert the new hook function into the array. The //priority// is a signed integer value (''int'') and the whole value range of that data type can be used. As you see in Figure {{ref>nfhookregister}}, //Netfilter// sorts the hook functions in ascending order from lower to higher //priority// values. Thus, a hook function with lower value like ''-200'' comes BEFORE a hook function with a higher value like ''100''. However in practice not the full range of values of the //priority// integer seems to be used. The kernel contains several //enums// which define some common discrete //priority// values. Things seem a little messy here, because those enums are (a little) different for each protocol (= for each //Address Family// how //Nftables// would call it). Figure {{ref>nfipv4hookpriorities}} shows as an example the enum for the IPv4 protocol.
  
 <figure nfipv4hookpriorities> <figure nfipv4hookpriorities>
Line 124: Line 121:
 </code> </code>
 <caption>IPv4 hook priorities //enum//\\  <caption>IPv4 hook priorities //enum//\\ 
-Source code extract from ''include/uapi/linux/netfilter_ipv4.h'', kernel v5.4.0</caption>+Source code extract from ''[[https://elixir.bootlin.com/linux/v5.4.19/source/include/uapi/linux/netfilter_ipv4.h#L30|include/uapi/linux/netfilter_ipv4.h]]''</caption>
 </figure> </figure>
  
-I go into such detail here, because this enum shows you the discrete //priority// values which are being used by kernel components like //connection tracking// when registering their own callbacks with a //Netfilter// hook. This is relevant for //Iptables// and //Nftabless// as you will see below.+I go into such detail here, because this enum shows you the discrete //priority// values which are being used by kernel components like //connection tracking// when registering their own hook functions with a Netfilter hook. This is relevant for //Iptables// and //Nftabless// as you will see below.
  
 ==== Hard-coded vs. Flexibility ==== ==== Hard-coded vs. Flexibility ====
-The //Netfilter// hooks themselves are hard-coded into the Linux kernel network stack. You'll find them in the source code if you search for function calls named ''NF_HOOK()''((or similar... a few variations exist)). In case you are wondering, why other kernel components are required to register callbacks with these hooks at +The Netfilter hooks themselves are hard-coded into the Linux kernel network stack. You'll find them in the source code if you search for function calls named ''NF_HOOK()''((or similar... a few variations exist)). In case you are wondering, why other kernel components are required to register hook functions with these Netfilter hooks at runtime and why those hook functions are not also hard coded... well I did not write this code, so your guess is as good as mine. There are many potential reasons which might have led to these design decisions, but common sense (and comments on some websites) made at least these two reasons obvious to me:
-runtime and why those callbacks are not also hard coded... well I did not write this code, so your guess is as good as mine. There are many potential reasons which might have led to these design decisions, but common sense (and comments on some websites) made at least these two reasons obvious to me:+
  
   - For once this kind of flexibility during runtime is an essential basic requirement in a kernel where many components (also //Nftables//, //Iptables// and //connection tracking//) can potentially be loaded or unloaded during runtime as //kernel modules// and which employs powerful concepts of further abstraction like //network namespaces//.   - For once this kind of flexibility during runtime is an essential basic requirement in a kernel where many components (also //Nftables//, //Iptables// and //connection tracking//) can potentially be loaded or unloaded during runtime as //kernel modules// and which employs powerful concepts of further abstraction like //network namespaces//.
-  - Performance is a crucial issue. Every network packet needs to traverse all callbacks registered with a hook. Thus, those callbacks should be registered in an economical way. This is probably one of the driving reasons why //base chains// in //Nftables// need to be explicitly created by the user in contrast to the pre-defined chains of //Iptables// (more details below). +  - Performance is a crucial issue. Every network packet needs to traverse all hook functions registered with a Netfilter hook. Thus, those hook functions should be registered in an economical way. This is probably one of the driving reasons why //base chains// in //Nftables// need to be explicitly created by the user in contrast to the pre-defined chains of //Iptables// (more details below).
  
 ==== Hook traversal and verdict ==== ==== Hook traversal and verdict ====
-Now let's take a more detailed look on how the callbacks which are registered with the same hook are being traversed by network packets.  +Now let's take a more detailed look on how the hook functions which are registered with the same Netfilter hook are being traversed by network packets.  
-For each network packet which traverses this hook, the callback functions are being called one by one+For each network packet which traverses this hook, the hook functions are being called one by one
 in the sequence/order in which they are present within the array of the hook (the sequence defined by in the sequence/order in which they are present within the array of the hook (the sequence defined by
 the //priority// value). the //priority// value).
Line 145: Line 140:
 <figure nfhookentriesflow> <figure nfhookentriesflow>
 {{ :linux:nf-hook-entries-flow1.png?direct&700 }} {{ :linux:nf-hook-entries-flow1.png?direct&700 }}
-<caption>Packet flow through callbacks registered with a hook (click to enlarge)</caption>+<caption>Packet flow through hook functions registered with a Netfilter hook (click to enlarge)</caption>
 </figure> </figure>
  
-Network packets are represented within the Linux kernel as instances +Network packets are represented within the Linux kernel as instances of ''struct sk_buff'' (often referred to as "socket buffer" and abbreviated as //"skb"//). A pointer to such an //skb// instance is given as function argument to all these hook functions , so each one can examine the packet. Each hook function is required to give a "verdict" back to //Netfilter// as //return-value//. There are several possible values for the "verdict", but for understanding these concepts only these two are relevant: ''NF_ACCEPT'' and ''NF_DROP''. ''NF_ACCEPT'' tells //Netfilter//, that the hook function "accepts" the network packet. This means the packet now traverses the next hook function registered with this hook (if existing). If all hook functions of this hook return ''NF_ACCEPT'', then the packet finally continues its traversal of the kernel network stack. However, if a hook function returns ''NF_DROP'', then the packet is being "dropped" (=deleted) and no further hook functions or parts of the network stack are being traversed.
-of ''struct sk_buff'' (often abbreviated as //"skb"//). A pointer to such an //skb// instance is given as function argument to all these callback functions, so each one can examine the packet. Each callback is required to give a "verdict" back to //Netfilter// as //return-value//. There are several possible values for the "verdict", but for understanding these concepts only these two are relevant: ''NF_ACCEPT'' and ''NF_DROP''. ''NF_ACCEPT'' tells //Netfilter//, that the callback "accepts" the network packet. This means the packet now traverses the next callback registered with this hook (if existing). If all callbacks of this hook return ''NF_ACCEPT'', then the packet finally continues its traversal of the kernel network stack. However, if a callback returns ''NF_DROP'', then the packet is being "dropped" (=deleted) and no further callbacks or parts of the network stack are being traversed. +
  
 ===== Iptables ===== ===== Iptables =====
-To put things into context, let's take a short look at //Iptables// as the predecessor of //Nftables//. //Iptables// organizes its //rules// into //tables// and //chains//, whereas //tables// for the most part merely are a means (a container) to group //chains// together, which have something in common. E.g. //chains// which are used for //nat// belong to the ''nat''((Well, ''nat'' is already a special case and there is more magic behind it. E.g. only the very first packet of each connection will traverse the //chains// of the ''nat'' table, but that topic is beyond this article.)) //table//. The actual //rules// reside inside the //chains//. +To put things into context, let's take a short look at //Iptables// as the predecessor of //Nftables//. //Iptables// organizes its //rules// into //tables// and //chains//, whereas //tables// for the most part merely are a means (a container) to group //chains// together, which have something in common. E.g. //chains// which are used for //nat// belong to the ''nat''((Well, ''nat'' is already a special case and there is more magic behind it. E.g. only the very first packet of each connection will traverse the //chains// of the ''nat'' table, but that topic is beyond this article.)) //table//. The actual //rules// reside inside the //chains//. //Iptables// registers its //chains// with the Netfilter hooks by registering its own hook functions as described above. This means when a network packet traverses a hook (e.g. //Prerouting//), then this packet traverses the //chains// which are registered with this hook and thereby traverses their //rules//.
-//Iptables// registers its //chains// with the //Netfilter// hooks by registering its own callback functions as described above. This means when a network packet traverses a hook (e.g. //Prerouting//), then this packet traverses the //chains// which are registered with this hook and thereby traverses their //rules//.+
  
-In case of //Iptables// all that is already pre-defined. A fixed set of //tables// exists, each //table// containing a fixed set of //chains//((Ok, as a user you can also create additional //chains// if you want, but those are not registered with //Netfilter// hooks and anyway that is a different topic.)). The //chains// are named like the hooks with which they are registered. +In case of //Iptables// all that is already pre-defined. A fixed set of //tables// exists, each //table// containing a fixed set of //chains//((Ok, as a user you can also create additional //chains// if you want, but those are not registered with Netfilter hooks and anyway that is a different topic.)). The //chains// are named like the Netfilter hooks with which they are registered. 
  
 ^ table ^ contains chains ^ command to show that ^ ^ table ^ contains chains ^ command to show that ^
Line 187: Line 179:
  
 ===== Connection tracking ===== ===== Connection tracking =====
-As you can see in Figure {{ref>nfthooksiptables}}, the //connection tracking// system also registers itself with the //Netfilter// hooks and based on the //priority// value (''-200'') you can clearly see which //Iptables// //chain// is called BEFORE and which AFTER the //connection tracking// callback. +As you can see in Figure {{ref>nfthooksiptables}}, the //connection tracking// system also registers itself with the Netfilter hooks and based on the //priority// value (''-200'') you can clearly see which //Iptables// //chain// is called BEFORE and which AFTER the //connection tracking// hook function. There is much more to tell about //connection tracking//. If you further look into details, then you'll see that the //connection tracking// system actually even registers more hook functions with the Netfilter hooks, than shown here. However, the two hook functions shown represent a sufficient model to understand the behavior of //connection tracking// when creating //Iptables// or //Nftables// rules. I elaborate on the topic //connection tracking// in detail in a separate series of blog articles, starting with [[connection_tracking_1_modules_and_hooks|Connection tracking - Part 1: Modules and Hooks]].
- +
-There is much more to tell about //connection tracking//. If you further look into details, then you'll see that the //connection tracking// system actually even registers more callback functions with the //Netfilter// hooks, than shown here. However, the two callbacks shown here represent a sufficient model to understand the behavior of //connection tracking// when creating //Iptables// or //Nftables// rules.  +
-I elaborate on the topic //connection tracking// in detail in a separate series of blog articles, starting with [[connection_tracking_1_modules_and_hooks|Connection tracking - Part 1: Modules and Hooks]].+
  
 ===== Nftables ===== ===== Nftables =====
 In general //Nftables// organizes its //rules// into //tables// and //chains// in the same way //Iptables// does. //Tables// are again containers for //chains// and //chains// are carrying the //rules// In general //Nftables// organizes its //rules// into //tables// and //chains// in the same way //Iptables// does. //Tables// are again containers for //chains// and //chains// are carrying the //rules//
 However, in contrast to //Iptables//, no pre-defined //tables// or //chains// exist. All //tables// and //chains// have to be explicitly created by the user. The user can give arbitrary names to the //tables// and //chains// when creating them. However, in contrast to //Iptables//, no pre-defined //tables// or //chains// exist. All //tables// and //chains// have to be explicitly created by the user. The user can give arbitrary names to the //tables// and //chains// when creating them.
-//Nftables// distinguishes between so-called //base chains// and //regular chains//. A //base chain// is a //chain// which is being registered with a //Netfilter// hook (by means of callback functions as described above) and you must specify that hook when you create the //chain//+//Nftables// distinguishes between so-called //base chains// and //regular chains//. A //base chain// is a //chain// which is being registered with a Netfilter hook (by means of hook functions as described above) and you must specify that hook when you create the //chain//
 A //regular chain// is not registered with any hook (//regular chains// are not covered in this article)((The //regular chains// represent the same feature as I already mentioned for //Iptables//. The user can create an arbitrary number of //chains// which are not registered to any hook and use them similar as you would use //functions// in a programming language. But that is an entirely different topic.)).  A //regular chain// is not registered with any hook (//regular chains// are not covered in this article)((The //regular chains// represent the same feature as I already mentioned for //Iptables//. The user can create an arbitrary number of //chains// which are not registered to any hook and use them similar as you would use //functions// in a programming language. But that is an entirely different topic.)). 
-Thus, the user is not forced to name the //base chains// like the hooks they will be registered with. This obviously offers more freedom and flexibility, but thereby also has more potential to create confusion. +Thus, the user is not forced to name the //base chains// like the Netfilter hooks they will be registered with. This obviously offers more freedom and flexibility, but thereby also has more potential to create confusion.
  
 ==== Address Families ==== ==== Address Families ====
Line 237: Line 225:
 | ''raw''      | ''-300'' | ''raw''      | ''-300''
 | ''mangle''   | ''-150'' | ''mangle''   | ''-150''
-| conntrack((As you can guess, this is NOT one of the placeholder names you can use. I added it here as a reminder which //priority// value is reserved for the //connection tracking// callback.))    | ''-200'' |+| conntrack((As you can guess, this is NOT one of the placeholder names you can use. I added it here as a reminder which //priority// value is reserved for the //connection tracking// hook function.))    | ''-200'' |
 | ''dstnat''   | ''-100'' | ''dstnat''   | ''-100''
 | ''filter''   | ''0''    |  | ''filter''   | ''0''    | 
Line 266: Line 254:
 </code> </code>
  
-But what actually happens when you register two //base chains// with the same hook +But what actually happens when you register two //base chains// with the same hook which both have the same //priority//? The source code of //Netfilter// answers this question. It actually allows to register hook functions with the same hook which have the same //priority// value. In case of the following example, function ''nf_register_net_hook()'' is first called for //chain1// and then for //chain2//.
-which both have the same //priority//? The source code of //Netfilter// answers this question. It actually allows to register callbacks with the same hook which have the same //priority// value. +
-In case of the following example, function ''nf_register_net_hook()'' is +
-first called for //chain1// and then for //chain2//.+
  
 <code bash> <code bash>
Line 279: Line 264:
 ''net/netfilter/core.c'' in kernel v5.4.0)) and was able to confirm the behavior with the ''net/netfilter/core.c'' in kernel v5.4.0)) and was able to confirm the behavior with the
 //Nftables// ''nftrace'' feature: The kernel code places //chain2// BEFORE //Nftables// ''nftrace'' feature: The kernel code places //chain2// BEFORE
-(in front of) //chain1// in the array of callbacks for this hook. As a result, +(in front of) //chain1// in the array of hook functions for this hook. As a result, 
 network packets then traverse //chain2// BEFORE //chain1//. This means here network packets then traverse //chain2// BEFORE //chain1//. This means here
 the sequence/order in which you issue the commands to register both chains becomes relevant! the sequence/order in which you issue the commands to register both chains becomes relevant!
Line 318: Line 303:
  
  
 +==== List hook functions (coming soon) ====
 +Nftables developers in July 2021 announced a new feature, which will
 +likely be included in the next version of Nftables to be released;
 +see [[http://git.netfilter.org/nftables/commit/?id=4694f7230195bfcff179ed418ddcdd5ff7d5a8e1|this recent git commit]]. This feature lets Nftables list all the hook functions which are currently
 +registered with a specified Netfilter hook together with their assigned
 +priorities. If you e.g. like to list all hook functions currently registered with the Netfilter
 +IPv4 Prerouting hook, the syntax to do that will probably be something like
 +''nft list hook ip prerouting''.
 ===== Context ===== ===== Context =====
 The described behavior and implementation has been observed on a The described behavior and implementation has been observed on a
Line 329: Line 322:
 ===== Feedback ===== ===== Feedback =====
 [[:feedback|Feedback]] to this article is very welcome! [[:feedback|Feedback]] to this article is very welcome!
 +
 +
 +//published 2020-05-17//, //last modified 2022-08-07//
  
  
blog/linux/nftables_packet_flow_netfilter_hooks_detail.1617576892.txt.gz · Last modified: 2021-04-05 by Andrej Stender