blog:linux:connection_tracking_1_modules_and_hooks
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
blog:linux:connection_tracking_1_modules_and_hooks [2021-04-05] – fixed kernel version of some links Andrej Stender | blog:linux:connection_tracking_1_modules_and_hooks [2023-08-15] (current) – improved/fixed some statements, typos Andrej Stender | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | {{tag> | + | {{tag> |
- | ====== Connection tracking - Part 1: Modules and Hooks ====== | + | ====== Connection tracking |
~~META: | ~~META: | ||
date created = 2021-04-04 | date created = 2021-04-04 | ||
~~ | ~~ | ||
- | |||
- | ~~NOTOC~~ | ||
With this article series I like to take a closer look at the connection tracking subsystem of the Linux kernel, | With this article series I like to take a closer look at the connection tracking subsystem of the Linux kernel, | ||
which provides the basis for features like stateful packet filtering and NAT. | which provides the basis for features like stateful packet filtering and NAT. | ||
I refer to it as the "ct system" | I refer to it as the "ct system" | ||
- | It is not my intention to replace or repeat existing documentation. Great articles on the topic already exist, however most of them are a little bit dated; see [[# | + | It is not my intention to replace or repeat existing documentation. Great articles on the topic already exist, however most of them are a little bit dated; see [[# |
===== Articles of the series ===== | ===== Articles of the series ===== | ||
- | * [[connection_tracking_1_modules_and_hooks|Connection tracking - Part 1: Modules and Hooks]] | + | * [[connection_tracking_1_modules_and_hooks|Connection tracking |
- | * Connection tracking - Part 2: Core Implementation (coming soon) | + | * [[connection_tracking_2_core_implementation|Connection tracking |
+ | * [[connection_tracking_3_state_and_examples|Connection tracking | ||
===== Overview ===== | ===== Overview ===== | ||
What is the purpose of connection tracking and what does it do? Once activated, connection tracking (the ct system inside the Linux kernel) examines IPv4 and/or IPv6 network packets and their payload, with the intention to determine which packets are associated with each other, e.g. in the scope of a connection-oriented protocol like TCP. The ct system performs this task as a transparent observer and does not take active part in the communication between endpoints. It is not relevant for the ct system, whether the endpoints of a connection are local or remote. They could be located on remote hosts, in which case the ct system would observe them while running on a host which merely is routing or bridging the packets of a particular connection. Alternatively, | What is the purpose of connection tracking and what does it do? Once activated, connection tracking (the ct system inside the Linux kernel) examines IPv4 and/or IPv6 network packets and their payload, with the intention to determine which packets are associated with each other, e.g. in the scope of a connection-oriented protocol like TCP. The ct system performs this task as a transparent observer and does not take active part in the communication between endpoints. It is not relevant for the ct system, whether the endpoints of a connection are local or remote. They could be located on remote hosts, in which case the ct system would observe them while running on a host which merely is routing or bridging the packets of a particular connection. Alternatively, | ||
- | The ct system maintains an up-to-date (live) list of all tracked connections. Based on that it " | + | The ct system maintains an up-to-date (live) list of all tracked connections. Based on that it " |
The ct system itself does never alter/ | The ct system itself does never alter/ | ||
Line 34: | Line 33: | ||
nft add rule ip filter forward iif eth0 ct state established accept | nft add rule ip filter forward iif eth0 ct state established accept | ||
</ | </ | ||
- | < | + | < |
<code bash> | <code bash> | ||
nft add table ip filter | nft add table ip filter | ||
Line 69: | Line 68: | ||
**Quick refresher on network namespaces** | **Quick refresher on network namespaces** | ||
- | By means of [[wp> | + | By means of [[wp> |
</ | </ | ||
Line 75: | Line 74: | ||
===== Netfilter hooks ===== | ===== Netfilter hooks ===== | ||
- | Like Iptables and Nftables, the ct system is built on top of the Netfilter framework. It implements | + | Like Iptables and Nftables, the ct system is built on top of the Netfilter framework. It implements |
- | If you are not yet very familiar with Netfilter hooks, better first take a look at my other article [[nftables_packet_flow_netfilter_hooks_detail|Nftables - Packet flow and Netfilter hooks in detail]], before proceeding here. From the bird's eye view, the famous | + | If you are not yet very familiar with Netfilter hooks, better first take a look at my other article [[nftables_packet_flow_netfilter_hooks_detail|Nftables - Packet flow and Netfilter hooks in detail]], before proceeding here. From the bird's eye view, the //Netfilter Packet Flow// image shown in Figure {{ref> |
<figure nfpackflowofficial> | <figure nfpackflowofficial> | ||
Line 83: | Line 82: | ||
</ | </ | ||
- | The blocks named // | + | The blocks named // |
===== Module nf_conntrack ===== | ===== Module nf_conntrack ===== | ||
- | Let's get back to the example above and take a look at the kernel module of the ct system | + | Let's get back to the example above and take a look at the kernel module of the ct system |
- | '' | + | '' |
- | The Nftables rules shown in the example above specify //address family// '' | + | The Nftables rules shown in Figure {{ref> |
<figure nfcthooks1> | <figure nfcthooks1> | ||
{{ : | {{ : | ||
< | < | ||
- | The four conntrack | + | The four conntrack |
</ | </ | ||
</ | </ | ||
- | While function '' | + | While function '' |
- | Both functions internally do reference counting. This means that in the current network namespace, maybe one, maybe several kernel components at some point require connection tracking and thereby call '' | + | Both functions internally do reference counting. This means that in the current network namespace, maybe one, maybe several kernel components at some point require connection tracking and thereby call '' |
- | ==== The main ct hook callbacks | + | ==== The main ct hook functions |
- | The two hook callbacks | + | The two hook functions |
- | // | + | |
- | are the very same // | + | |
- | their placement... the one in the // | + | |
- | on the network while the one in the //Output// hook handles outgoing packets generated on this host. | + | |
- | These two can be considered the " | + | |
- | of what the ct system does with traversing network packets happens inside | + | |
- | them... | + | |
- | ==== The help+confirm | + | ==== The help+confirm |
- | Another two hook callbacks | + | Another two hook functions |
hook and in the // | hook and in the // | ||
- | MAX means the highest possible unsigned integer value. A callback | + | MAX means the highest possible unsigned integer value. A hook function |
priority will be traversed as the very last one within the Netfilter hook and no | priority will be traversed as the very last one within the Netfilter hook and no | ||
- | other callback | + | other hook function |
- | here are not shown in Figure {{ref> | + | here are not shown in Figure {{ref> |
- | considered | + | some internal thing which is not worth mentioning on the bird's eye view. |
- | view. I'll elaborate on their purpose in the sections below. For now, let's | + | They both do the same thing to traversing packets. The only difference between |
- | just say that both of them do the same thing and I refer to them as the " | + | both is their placement in the Netfilter hooks, which makes sure that ALL |
- | callbacks in the scope of this article. Their placement in the Netfilter hooks | + | network packets, no matter if incoming/ |
- | makes sure, that ALL network packets, no matter if incoming/ | + | of them as the very last thing after having traversed all other hook functions. |
- | packets, traverse them as the very last thing after having traversed all other | + | I refer to them as the //conntrack " |
- | callbacks. | + | series, hinting that they got two independent purposes. One is to execute |
- | + | " | |
- | + | specific use cases and I won't cover that topic in the scope of this first | |
- | + | article. The second is to " | |
- | + | I'll elaborate on what that means in the sections below. | |
+ | <WRAP round info> | ||
+ | Only in recent kernel versions by the time of writing (here kernel v5.10.19) | ||
+ | both mentioned features, " | ||
+ | within the same hook functions. Not too long ago both still existed in form of | ||
+ | separate ct hook functions in the //Input// and the // | ||
+ | Netfilter hooks: The " | ||
+ | 300 and the " | ||
+ | See e.g. [[https:// | ||
+ | [[https:// | ||
+ | commit]] during migration from kernel v5.0 to v5.1. | ||
+ | </ | ||
===== Modules nf_defrag_ipv4/ | ===== Modules nf_defrag_ipv4/ | ||
- | As shown above, module '' | + | As shown in Figure {{ref> |
<figure nfdefraghooks1> | <figure nfdefraghooks1> | ||
{{ : | {{ : | ||
- | < | + | < |
</ | </ | ||
</ | </ | ||
- | Like the ct system itself, those defrag modules do not become globally active on module load. They export (=provide) functions '' | + | Like the ct system itself, those defrag modules do not become globally active on module load. They export (=provide) functions '' |
- | The ct system' | + | Figure {{ref> |
+ | This function is being registered | ||
+ | The ct system' | ||
===== Hooks Summary ===== | ===== Hooks Summary ===== | ||
Figure {{ref> | Figure {{ref> | ||
- | //contrack// and | + | //conntrack// and |
- | // | + | // |
- | of Iptables. For completenes | + | of Iptables. For completeness |
provide for a comfortable comparison to what you see in the official //Netfilter | provide for a comfortable comparison to what you see in the official //Netfilter | ||
Packet Flow image// in Figure {{ref> | Packet Flow image// in Figure {{ref> | ||
Line 155: | Line 159: | ||
<figure nfhooks-complete1> | <figure nfhooks-complete1> | ||
{{ : | {{ : | ||
- | < | + | < |
</ | </ | ||
Of course when I created that image, I faced the same dilemma as the makers of the official image: | Of course when I created that image, I faced the same dilemma as the makers of the official image: | ||
- | When using Nftables (and I assume most people nowadays do) you can create and name your base chains | + | When using Nftables (and I assume most people nowadays do) you can create and name your tables and base chains |
freely to your liking. But this would not leave much to show for by default in an image | freely to your liking. But this would not leave much to show for by default in an image | ||
like that. Thus, showing the old but well known Iptables chains still seemed | like that. Thus, showing the old but well known Iptables chains still seemed | ||
like the most pragmatic thing to do. | like the most pragmatic thing to do. | ||
- | The important thing which Figure {{ref> | + | The important thing which Figure {{ref> |
- | They all first traverse one of the // | + | They all first traverse one of the // |
- | or the //Output// hook. This ensures that these callbacks | + | or the //Output// hook. This ensures that these function(s) |
before the ct system is able to see them. After that, the packets traverse a potential | before the ct system is able to see them. After that, the packets traverse a potential | ||
Iptables chain of the raw table (if existing / in use) and then one of the main // | Iptables chain of the raw table (if existing / in use) and then one of the main // | ||
- | callbacks | + | hook functions |
which are commonly used for packet filtering, are traversed after that. Then, as the very | which are commonly used for packet filtering, are traversed after that. Then, as the very | ||
- | last thing the packets traverse one of the //contrack | + | last thing the packets traverse one of the //conntrack |
===== How it works... ===== | ===== How it works... ===== | ||
- | I know... so far I kept beating around the bush. Now let's finally talk about how the ct system actually operates and what it does to network packets traversing its hook callbacks. Please be aware that what I describe in this section are the basics and does not cover all what the ct system actually does. The ct system maintains the connections which it is tracking in a central table. Each tracked connection is represented by an instance of '' | + | I know... so far I kept beating around the bush. Now let's finally talk about how the ct system actually operates and what it does to network packets traversing its hook functions. Please be aware that what I describe in this section are the basics and does not cover all what the ct system actually does. The ct system maintains the connections which it is tracking in a central table. Each tracked connection is represented by an instance of '' |
- | - It is either | + | - It is either part of or related to one of its tracked connections. |
- | - It is the first packet of a new connection which is not yet tracked. | + | - It is the first seen packet of a connection which is not yet tracked. |
- It is an invalid packet, which is broken or doesn' | - It is an invalid packet, which is broken or doesn' | ||
- It is marked as NOTRACK, which tells the ct system to ignore it. | - It is marked as NOTRACK, which tells the ct system to ignore it. | ||
Line 184: | Line 188: | ||
and //Input// hooks and then is received by a local socket. As pointed out | and //Input// hooks and then is received by a local socket. As pointed out | ||
in the previous section, what the ct system does here also applies to outgoing | in the previous section, what the ct system does here also applies to outgoing | ||
- | or forwarded network packets as well. Thus no need for separate | + | or forwarded network packets as well. Thus, no need for additional |
<figure nfct-lookup> | <figure nfct-lookup> | ||
{{ : | {{ : | ||
< | < | ||
- | Network packet traversing ct main callback | + | Network packet traversing ct main hook function |
ct table finds that packet belongs to already tracked connection, | ct table finds that packet belongs to already tracked connection, | ||
packet is given pointer to that connection. | packet is given pointer to that connection. | ||
Line 196: | Line 200: | ||
Figure {{ref> | Figure {{ref> | ||
- | being part of an already tracked connection. When that packet traverses the main // | + | being part of an already tracked connection. When that packet traverses the main // |
+ | Further, the OSI layer 4 protocol of the packet is now being analyzed and latest protocol state and details are saved to its tracked connection instance. Then the packet continues on its way through other hook functions | ||
<figure nfct-new> | <figure nfct-new> | ||
{{ : | {{ : | ||
< | < | ||
- | Packet traversing ct main callback | + | Packet traversing ct main hook function |
finds no match, packet is considered first one of new connection, new connection | finds no match, packet is considered first one of new connection, new connection | ||
is created and packet is given pointer to it, new connection is later " | is created and packet is given pointer to it, new connection is later " | ||
- | and added to ct table in " | + | and added to ct table in " |
</ | </ | ||
</ | </ | ||
Line 210: | Line 215: | ||
Figure {{ref> | Figure {{ref> | ||
being the first one representing a new connection which is not yet tracked by the ct system. | being the first one representing a new connection which is not yet tracked by the ct system. | ||
- | When that packet traverses the main // | + | When that packet traverses the main // |
- | it passes the already mentioned validity checks. However, in this case the lookup in the ct table (1) does not find a matching connection. As a result, the ct system considers the packet to be the first one of a new connection. A new instance of '' | + | it passes the already mentioned validity checks. However, in this case the lookup in the ct table (1) does not find a matching connection. As a result, the ct system considers the packet to be the first one of a new connection((To be precise: The first one the ct system has //seen// from that connection. That does not necessarily mean that this always must be the actual very first packet of a new connection, because there might be cases where the ct system for whatever reason did not see the first few packets of an actual connection and kind-of starts tracking in the middle of an already existing connection.)). A new instance of '' |
+ | Further, | ||
Figure {{ref> | Figure {{ref> | ||
- | The very last thing that packet traverses before being received by a local socket, is the conntrack “help+confirm” | + | The very last thing that packet traverses before being received by a local socket is the //conntrack “help+confirm”// hook function |
+ | But even if a client, who is trying to establish e.g. a TCP connection by sending a TCP SYN packet, is behaving normally, it would still send out several TCP SYN packets as retransmissions if it does not receive any reply from the peer side. Thus, if you have a '' | ||
- | The third possibility is, that the ct system considers a packet as // | + | The third possibility is, that the ct system considers a packet as // |
However, it is not the job of the ct system to drop invalid packets((However, | However, it is not the job of the ct system to drop invalid packets((However, | ||
- | The fourth possibility is a means for other kernel components like Nftables to mark packets with a "do not track" bit which tells the ct system to ignore them. For this to work with Nftables, you would need to create a chain with a priority smaller than -200 (e.g. -300), which ensures it is traversed before the main ct callback | + | The fourth possibility is a means for other kernel components like Nftables to mark packets with a "do not track" bit((Actually that bit is named '' |
Line 233: | Line 240: | ||
initialized as described in the sections above, a new connection is first added to | initialized as described in the sections above, a new connection is first added to | ||
the // | the // | ||
- | dropped before reaching the ct system' | + | dropped before reaching the ct system' |
connection is removed from the list and deleted. If the packet however passes | connection is removed from the list and deleted. If the packet however passes | ||
- | the // | + | the // |
- | and is marked as " | + | and is marked as " |
- | will be considered " | + | |
of time strongly depends on network protocol, state and traffic behavior of that | of time strongly depends on network protocol, state and traffic behavior of that | ||
connection. Once " | connection. Once " | ||
Line 258: | Line 264: | ||
===== Context ===== | ===== Context ===== | ||
- | The described behavior and implementation has been observed on a 5.10.19 kernel | + | The described behavior and implementation has been observed on a v5.10.19 kernel |
- | in a Debian 10 (buster) system with using Debian // | + | in a Debian 10 //buster// system with using Debian // |
using kernel build configuration from Debian. Thus, the description of the connection tracking implementation in this article, especially regarding kernel modules like '' | using kernel build configuration from Debian. Thus, the description of the connection tracking implementation in this article, especially regarding kernel modules like '' | ||
Line 269: | Line 275: | ||
* [[http:// | * [[http:// | ||
* [[http:// | * [[http:// | ||
+ | * [[https:// | ||
* [[https:// | * [[https:// | ||
* [[http:// | * [[http:// | ||
+ | ===== Continue with next article ===== | ||
+ | [[connection_tracking_2_core_implementation|Connection tracking (conntrack) - Part 2: Core Implementation]] | ||
+ | |||
+ | |||
+ | //published 2021-04-04//, | ||
blog/linux/connection_tracking_1_modules_and_hooks.1617576193.txt.gz · Last modified: 2021-04-05 by Andrej Stender