Sockets in the Linux Kernel - Part 2: UDP Socket Lookup on Rx

In this article series I like to explore the implementation of sockets in the Linux Kernel and the source code surrounding them. While most of my previous articles focused primarily on OSI Layer 3, this series will attempt a dive into OSI Layer 4. It is very easy for readers of the kernel source code to not see the forest for the trees due to sheer complexity. It is my intent to throw a lifeline here to help navigating the code and to hold on to what is essential.

→ Read more...

2025-03-02 · Andrej Stender

linux, kernel, socket, udp

Sockets in the Linux Kernel - Part 1: L4 Protocol Demultiplexing on Rx

In this article series I like to explore the implementation of sockets in the Linux Kernel and the source code surrounding them. While most of my previous articles focused primarily on OSI Layer 3, this series will attempt a dive into OSI Layer 4. It is very easy for readers of the kernel source code to not see the forest for the trees due to sheer complexity. It is my intent to throw a lifeline here to help navigating the code and to hold on to what is essential.

→ Read more...

2025-01-05 · Andrej Stender

linux, kernel, ipsec, xfrm, netfilter, socket

Flowtables - Part 2: IPsec gateway in tunnel-mode

In this article series I like to take a look at flowtables, which is a network fastpath mechanism in the Linux kernel, based on Netfilter/Nftables, that allows accelerated handling of forwarded TCP and UDP connections. When using an acceleration feature like this, it is important to understand how it works. If you don't, then you'll have a hard time once you are going beyond just plain forwarding and start combining that acceleration with other networking features like e.g. Firewalling, NAT, advanced routing, QoS or IPsec. In this second article, I'll show how the packet flow looks like when you use a flowtable on a VPN gateway based on IPsec in tunnel-mode.

→ Read more...

2022-08-14 · Andrej Stender

linux, kernel, routing, ipsec, netfilter, nftables, flowtables

Flowtables - Part 1: A Netfilter/Nftables Fastpath

In this article series I like to take a look at flowtables, which is a network fastpath mechanism in the Linux kernel, based on Netfilter/Nftables, that allows accelerated handling of forwarded TCP and UDP connections. When using an acceleration feature like this, it is important to understand how it works. If you don't, then you'll have a hard time once you are going beyond just plain forwarding and start combining that acceleration with other networking features like e.g. Firewalling, NAT, advanced routing, QoS or IPsec. In this first article I'll take a deep look at the packet flow. I'll show you how you can setup and use a flowtable and explain how that mechanism works internally.

→ Read more...

2022-08-08 · Andrej Stender

linux, kernel, routing, netfilter, nftables, flowtables

Routing Decisions in the Linux Kernel - Part 2: Caching

In this article series I like to talk about the IPv4 routing lookup in the Linux kernel and how the routing decisions it produces determine the path network packets take through the stack. The data structures representing routing decisions are being used in many parts of the stack. They further represent the basis for route caching, which has a complex history. Thus, it is useful to know a little about their semantics. Further, the Linux kernel implements a lot of optimizations and advanced routing features, which can easily make you “not see the forest for the trees” when reading these parts of the source code. This article series attempts to mitigate that.

→ Read more...

2022-07-31 · Andrej Stender

linux, kernel, routing

Routing Decisions in the Linux Kernel - Part 1: Lookup and packet flow

In this article series I like to talk about the IPv4 routing lookup in the Linux kernel and how the routing decisions it produces determine the path network packets take through the stack. The data structures representing routing decisions are being used in many parts of the stack. They further represent the basis for route caching, which has a complex history. Thus, it is useful to know a little about their semantics. Further, the Linux kernel implements a lot of optimizations and advanced routing features, which can easily make you “not see the forest for the trees” when reading these parts of the source code. This article series attempts to mitigate that.

→ Read more...

2022-07-04 · Andrej Stender

linux, kernel, routing

Nftables - Demystifying IPsec expressions

In this article I like to take a look at the expressions provided by Nftables for matching IPsec-related network packets. The common situation is that you need to distinguish packets from normal traffic, which either have been received through a VPN tunnel and already have been decrypted or packets which are to be sent out on a VPN tunnel, but have not been encrypted yet. Those kind of packets can be matched by these expressions within packet filtering rules. I'll explain how these expressions work, what they use as back-end, what their limitations are and how you can use them to get your intended behavior. Further, I take a short glimpse at the Iptables equivalent of these expressions.

→ Read more...

2022-01-30 · Andrej Stender

linux, kernel, netfilter, nftables, iptables, ipsec, strongswan, xfrm

Connection tracking (conntrack) - Part 3: State and Examples

With this article series I like to take a closer look at the connection tracking subsystem of the Linux kernel, which provides the basis for features like stateful packet filtering and NAT. I refer to it as the “ct system” throughout the series. In this third article, I like to take a look at how the system analyzes and tracks the state of a connection and in which way IPtables/Nftables rules can make use of that. I further present some practical examples for common protocols like ICMP, UDP and TCP.

→ Read more...

2021-08-07 · Andrej Stender

linux, kernel, netfilter, conntrack, nftables, iptables

Connection tracking (conntrack) - Part 2: Core Implementation

With this article series I like to take a closer look at the connection tracking subsystem of the Linux kernel, which provides the basis for features like stateful packet filtering and NAT. I refer to it as the “ct system” throughout the series. In this second article, I take a deep look under the hood and dive into its core implementation. I revisit some of the topics of the first article, but this time on source code level. I give an overview of the most important data structures and explain how handling of the connection tracking table, connection lookup and connection life cycle works.

→ Read more...

2021-04-11 · Andrej Stender

linux, kernel, netfilter, conntrack, nftables, iptables

Connection tracking (conntrack) - Part 1: Modules and Hooks

With this article series I like to take a closer look at the connection tracking subsystem of the Linux kernel, which provides the basis for features like stateful packet filtering and NAT. I refer to it as the “ct system” throughout the series. It is not my intention to replace or repeat existing documentation. Great articles on the topic already exist, however most of them are a little bit dated; see References below. I intend to provide an up-to-date view by the time of writing, based on LTS kernel 5.10, and complement existing documentation by taking a deep look under the hood and show how things actually work. In this first article, I give an overview about the ct system's purpose and elaborate on how it relates to other kernel components like Netfilter and Nftables. I explain what happens when network packets traverse its Netfilter hook functions and how it serves as basis for stateful packet filtering.

→ Read more...

2021-04-04 · Andrej Stender

linux, kernel, netfilter, conntrack, nftables, iptables

Nftables - Packet flow and Netfilter hooks in detail

If you are using Iptables or the newer Nftables and you are merely doing some simple packet filtering with IPv4, then you'll probably get enough info out of the official documentation and by a quick look through websites which provide example configurations. However, if you are working on a little bit more complex stuff like writing Nftables rules while caring for both IPv4 and IPv6, while using IPsec and doing NAT, or other of the “more interesting” stuff… then things tend to get a little more tricky. If you want to be sure to know what you are doing and to create and place your tables, chains and rules correctly to make them do the right thing… then it is beneficial to understand the flow of network packets and the internal workings of Nftables and the underlying Netfilter framework in a little more detail.

→ Read more...

2020-05-29 · Andrej Stender

linux, kernel, netfilter, nftables, iptables

Nftables - Netfilter and VPN/IPsec packet flow

In this article I like to explain how the packet flow through Netfilter hooks looks like on a host which works as an IPsec-based VPN gateway in tunnel-mode. Obviously network packets which are to be sent through a VPN tunnel are encrypted+encapsulated on a VPN gateway and packets received through the tunnel are decapsulated and decrypted… but in which sequence does this exactly happen and which packet traverses which Netfilter hook in which sequence and in which form (encrypted/not yet encrypted/already decrypted)? I'll do a short recap of IPsec in general, explain the IPsec implementation on Linux as it is commonly used today (Strongswan + Xfrm framework) and explain packet traversal through the VPN gateways in an example site-to-site VPN setup (IPsec in tunnel-mode, IKEv2, ESP, IPv4). I'll focus on Nftables in favor of the older Iptables and I'll setup the VPN via the modern Vici/swanctl configuration interface of Strongswan instead of the older Stroke interface.

→ Read more...

2020-05-29 · Andrej Stender

linux, kernel, netfilter, nftables, ipsec, strongswan, charon, swanctl, xfrm