Introduction

This is a straightforward guide for a full setup of KinD with Podman run in rootless mode as a backend, on Windows Subsystem for Linux (WSL) version 2.

After the setup guide a brief deep dive follows.

This guide comes directly from the notes I’ve taken collecting the informations available on the web to use KinD rootless on my WSL workspace, and from the exploration I’ve gone through out of curiosity.

Table of contents

  1. Setup
    1. WSL
    2. Systemd
    3. Podman
    4. KinD
  2. Deep dive

Setup

WSL

Enable systemd on the WSL machine

cat <<EOF >>/etc/wsl.conf
[boot]
systemd=true
EOF

Disable cgroupv1 for the WSL instance (or globally at \Users\%USERPROFILE%\.wslconfig), for all cgroup controllers:

cat <<EOF >>/etc/wsl.conf
[wsl2]
kernelCommandLine = cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1
EOF

References:

From Windows host, restart WSL

wsl --shutdown

Move the cgroupv2 mount to classic /sys/fs/cgroup, for podman/docker clients

cat <<EOF >>/etc/fstab
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
EOF
mount -a

Systemd

Configure systemd resource delegation

This is required in order to run Podman in rootless mode and manage resources with control delegation to the cgroups subtrees.

For the root user slice:

cat <<EOF >/etc/systemd/system/user-0.slice
[Unit]
Before=systemd-logind.service
[Slice]
Slice=user.slice
[Install]
WantedBy=multi-user.target
EOF

For all other user slices:

cat <<EOF >/etc/systemd/system/user-.slice.d/override.conf
[Slice]
Slice=user.slice

CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
TasksAccounting=yes
EOF

For all user manager instances:

cat <<EOF >/etc/systemd/system/user@.service.d/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids
EOF

References:

Note: Delegate=yes delegate all supported controllers.

Podman

Install Podman

Depending on the Linux distribution the commands change. In my case, Ubuntu:

sudo apt update
sudo apt install -y podman

As journald is not available in WSL, configure podman’s log driver, events logger. Also, set the cgroup manager to cgroupfs:

cat <<EOF >/etc/containers/containers.conf
log_driver = "k8s-file"
events_logger = "file"
cgroup_manager = "cgroupfs"
EOF

CNI plugins

Install missing CNI plugins v1.1.1:

curl -SLO http://archive.ubuntu.com/ubuntu/pool/universe/g/golang-github-containernetworking-plugins/containernetworking-plugins_1.1.1+ds1-3_amd64.deb
sudo dpkg -i containernetworking-plugins_1.1.1+ds1-3_amd64.deb

Reference: https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2024394

Container runtime

Upgrade crun, version >= 1.9.0:

mkdir -p ~/.local/bin
curl -sL "https://github.com/containers/crun/releases/download/${CRUN_VER}/crun-${CRUN_VER}-linux-amd64" -o "${HOME}/.local/bin/crun"
chmod +x ~/.local/bin/crun
mkdir -p ~/.config/containers"
cat << EOF > ~/.config/containers/containers.conf
[engine.runtimes]
crun = [
  "${HOME}/.local/bin/crun",
  "/usr/bin/crun"
]
EOF
podman info | grep -i crun

Reference: https://noobient.com/2023/11/15/fixing-ubuntu-containers-failing-to-start-with-systemd/

KinD

Create a cluster

Install kind the way you prefer.

Now run the kind command to create cluster containers as user slices, instead of system ones (thus in cgroups under the Systemd user slice’s cgroup):

systemd-run --scope --user kind create cluster

Reference: https://kind.sigs.k8s.io/docs/user/rootless/#creating-a-kind-cluster-with-rootless-podman

Validate the control plane container:

$ kubectl version
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2
$ systemd-run --scope --user podman exec -it kind-control-plane bash
# ps -C kube-apiserver
    PID TTY          TIME CMD
    565 ?        00:01:23 kube-apiserver

Deep dive

Taking a look at what is going on under the hood, it can be noted that Podman created the KinD control-plane container’s cgroup (a4cfc6bc8a38ee23e42839393eedba68a92f7f8b3a23e70328a29e58c0c59d38 in the example below).

The container, the init.scope and the kubelet.service cgroups can be noted:

$ systemctl status
● mywsl
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Tue 2024-04-09 14:34:18 CEST; 4h 7min ago
   CGroup: /
           ├─user.slice
           │ └─user-1000.slice
           │   ├─user@1000.service
           │   │ ├─app.slice
           │   │ │ ├─podman.service
           │   │ │ │ ├─532 /usr/bin/podman
           │   │ │ │ └─734 /usr/bin/dbus-daemon --syslog --fork --print-pid 4 --print-address 6 --session
           │   │ │ ├─a4cfc6bc8a38ee23e42839393eedba68a92f7f8b3a23e70328a29e58c0c59d38
           │   │ │ │ ├─init.scope
           │   │ │ │ │ └─70678 /sbin/init
           │   │ │ │ ├─system.slice
           │   │ │ │ │ ├─containerd.service
...
           │   │ │ │ └─kubelet.slice
           │   │ │ │   ├─kubelet.service
           │   │ │ │   │ └─72203 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/con
...

By comparison instead, this is similar hierarchy but with rooful Docker-based KinD container:

$ systemctl status
● brodequin
    State: running
    Units: 401 loaded (incl. loaded aliases)
     Jobs: 0 queued
   Failed: 0 units
    Since: Fri 2024-04-05 18:56:16 CEST; 3 days ago
  systemd: 255.4-2-arch
   CGroup: /
           ├─init.scope
           │ └─1 /sbin/init
           ├─system.slice
           │ ├─containerd.service
...
           │ ├─docker-8aea66260a20e4fe20c03c3424d4f294b56eab451430492123e736c3f749ebca.scope
           │ │ ├─init.scope
           │ │ │ └─43996 /sbin/init
           │ │ ├─kubelet.slice
           │ │ │ └─kubelet.service
           │ │ │   └─44913 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=172.18.0.2 --node-l>
           │ │ └─system.slice
           │ │   ├─containerd.service
...
           │ │   └─systemd-journald.service
           │ │     └─44127 /lib/systemd/systemd-journald
           │ ├─docker.service
           │ │ ├─ 1368 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
           │ │ └─43961 /usr/bin/docker-proxy -proto tcp -host-ip 127.0.0.1 -host-port 40721 -container-ip 172.18.0.2 -container-port 6443

From the KinD control plane container it can be seen that is a normal domain cgroup and no processes are assigned to it:

$ cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/
a4cfc6bc8a38ee23e42839393eedba68a92f7f8b3a23e70328a29e58c0c59d38/cgroup.type
domain
cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/a4cfc6bc8a38ee23e42839393eedba68a92f7f8b3a23e70328a29e58c0c59d38/cgroup.procs

However, under it a whole system.slice is created with /sbin/init scope cgroup and a cgroup for Containerd:

$ ls -l /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/
a4cfc6bc8a38ee23e42839393eedba68a92f7f8b3a23e70328a29e58c0c59d38/system.slice
cgroup.controllers
cgroup.events
...
containerd.service
...
$ cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/
a4cfc6bc8a38ee23e42839393eedba68a92f7f8b3a23e70328a29e58c0c59d38/system.slice/containerd.service/cgroup.type
domain
$ cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/
a4cfc6bc8a38ee23e42839393eedba68a92f7f8b3a23e70328a29e58c0c59d38/system.slice/containerd.service/cgroup.procs
70837
71711
71731
71755
71798
73781
73816
74237
74259
74284

They’re exactly the Kubernetes pods' processes, plus Containerd itself.

For example:

$ ps -q 74284 -o args
COMMAND
/usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id 4796ab87414f85c7accdb5c7c858e37121f8dcfd0cd483fb74233d25a8f06817 -address /run/containerd/containerd.sock

and it can be noted that is related to one pod:

$ systemd-run --scope --user podman exec -it kind-control-plane crictl ps --pod 4796ab87414f85c7accdb5c7c858e37121f8dcfd0cd483fb74233d25a8f06817
Running scope as unit: run-r3403981399e3484c92993c384f192d63.scope
CONTAINER           IMAGE               CREATED             STATE               NAME                     ATTEMPT             POD ID              POD
0a98af2598be4       0500518ebaa68       36 minutes ago      Running             local-path-provisioner   0                   4796ab87414f8       local-path-provisioner-7577fdbbfb-zjdbq

Trying to run a rootful standard Nginx pod, it can be noted that the root user that runs the Nginx daemon in the container, is mapped to the UID 1000 in the root namespace, looking outside of the container:

$ kubectl run --image nginx nginx
pod/nginx created
$ ps -C nginx -o user -o args n
    USER COMMAND
    1000 nginx: master process nginx -g daemon off;
  100100 nginx: worker process
  100100 nginx: worker process
  100100 nginx: worker process
  100100 nginx: worker process
  100100 nginx: worker process
  100100 nginx: worker process
  100100 nginx: worker process
  100100 nginx: worker process

Indeed, Podman creates a dedicated user namespace for containers:

$ lsns --type user
        NS TYPE  NPROCS   PID USER  COMMAND
4026531837 user      10   260 massi /lib/systemd/systemd --user
4026532263 user      90   532 massi /usr/bin/podman

That’s it.


Documentation: