This post describes how to deploy cilium (and hubble) using docker on a Linux system with k3d or kind to test it as CNI and Service Mesh.

I wrote some scripts to do a local installation and evaluate cilium to use it at work (in fact we are using cilium on an EKS cluster now), but I thought it would be a good idea to share my original scripts in this blog just in case they are useful to somebody, at least for playing a little with the technology.

As there is no point on explaining here all the concepts related to cilium I’m providing some links for the reader interested on reading about it:

All the scripts and configuration files discussed in this post are available on my cilium-docker git repository.

Installation

For each platform we are going to deploy two clusters on the same docker network; I’ve chosen this model because it allows the containers to see the addresses managed by metallb from both clusters (the idea is to use those addresses for load balancers and treat them as if they were public).

The installation(s) use cilium as CNI, metallb for BGP (I tested the cilium options, but I wasn’t able to configure them right) and nginx as the ingress controller (again, I tried to use cilium but something didn’t work either).

To be able to use the previous components some default options have been disabled on k3d and kind and, in the case of k3d, a lot of k3s options (traefik, servicelb, kubeproxy, network-policy, …​) have also been disabled to avoid conflicts.

To use the scripts we need to install cilium, docker, helm, hubble, k3d, kind, kubectl and tmpl in our system.

After cloning the repository, the sbin/tools.sh script can be used to do that on a linux-amd64 system:

$ git clone https://forgejo.mixinet.net/blogops/cilium-docker.git
$ cd cilium-docker
$ ./sbin/tools.sh apps

Once we have the tools, to install everything on k3d (for kind replace k3d by kind) we can use the sbin/cilium-install.sh script as follows:

$ # Deploy first k3d cluster with cilium & cluster-mesh
$ ./sbin/cilium-install.sh k3d 1 full
[...]
$ # Deploy second k3d cluster with cilium & cluster-mesh
$ ./sbin/cilium-install.sh k3d 2 full
[...]
$ # The 2nd cluster-mesh installation connects the clusters

If we run the command cilium status after the installation we should get an output similar to the one seen on the following screenshot:

cilium status

The installation script uses the following templates:

Once we have finished our tests we can remove the installation using the sbin/cilium-remove.sh script.

Some notes about the configuration

  • As noted on the documentation, the cilium deployment needs to mount the bpffs on /sys/fs/bpf and cgroupv2 on /run/cilium/cgroupv2; that is done automatically on kind, but fails on k3d because the image does not include bash (see this issue).

    To fix it we mount a script on all the k3d containers that is executed each time they are started (the script is mounted as /bin/k3d-entrypoint-cilium.sh because the /bin/k3d-entrypoint.sh script executes the scripts that follow the pattern /bin/k3d-entrypoint-*.sh before launching the k3s daemon).

    The source code of the script is available here.

  • When testing the multi-cluster deployment with k3d we have found issues with open files, looks like they are related to inotify (see this page on the kind documentation); adding the following to the /etc/sysctl.conf file fixed the issue:

    # fix inotify issues with docker & k3d
    fs.inotify.max_user_watches = 524288
    fs.inotify.max_user_instances = 512
  • Although the deployment theoretically supports it, we are not using cilium as the cluster ingress yet (it did not work, so it is no longer enabled) and we are also ignoring the gateway-api for now.
  • The documentation uses the cilium cli to do all the installations, but I noticed that following that route the current version does not work right with hubble (it messes up the TLS support, there are some notes about the problems on this cilium issue), so we are deploying with helm right now.

    The problem with the helm approach is that there is no official documentation on how to install the cluster mesh with it (there is a request for documentation here), so we are using the cilium cli for now and it looks that it does not break the hubble configuration.

Tests

To test cilium we have used some scripts & additional config files that are available on the test sub directory of the repository:

  • cilium-connectivity.sh: a script that runs the cilium connectivity test for one cluster or in multi cluster mode (for mesh testing).

    If we export the variable HUBBLE_PF=true the script executes the command cilium hubble port-forward before launching the tests.

  • http-sw.sh: Simple tests for cilium policies from the cilium demo; the script deploys the Star Wars demo application and allows us to add the L3/L4 policy or the L3/L4/L7 policy, test the connectivity and view the policies.
  • ingress-basic.sh: This test is for checking the ingress controller, it is prepared to work against cilium and nginx, but as explained before the use of cilium as an ingress controller is not working as expected, so the idea is to call it with nginx always as the first argument for now.
  • mesh-test.sh: Tool to deploy a global service on two clusters, change the service affinity to local or remote, enable or disable if the service is shared and test how the tools respond.

Running the tests

The cilium-connectivity.sh executes the standard cilium tests:

$ ./test/cilium-connectivity.sh k3d 12
ℹ️  Monitor aggregation detected, will skip some flow validation
steps
✨ [k3d-cilium1] Creating namespace cilium-test for connectivity
check...
✨ [k3d-cilium2] Creating namespace cilium-test for connectivity
check...
[...]

✅ All 33 tests (248 actions) successful, 2 tests skipped,
0 scenarios skipped.

To test how the cilium policies work use the http-sw.sh script:

kubectx k3d-cilium2 # (just in case)
# Create test namespace and services
./test/http-sw.sh create
# Test without policies (exaust-port fails by design)
./test/http-sw.sh test
# Create and view L3/L4 CiliumNetworkPolicy
./test/http-sw.sh policy-l34
# Test policy (no access from xwing, exaust-port fails)
./test/http-sw.sh test
# Create and view L7 CiliumNetworkPolicy
./test/http-sw.sh policy-l7
# Test policy (no access from xwing, exaust-port returns 403)
./test/http-sw.sh test
# Delete http-sw test
./test/http-sw.sh delete

And to see how the service mesh works use the mesh-test.sh script:

# Create services on both clusters and test
./test/mesh-test.sh k3d create
./test/mesh-test.sh k3d test
# Disable service sharing from cluster 1 and test
./test/mesh-test.sh k3d svc-shared-false
./test/mesh-test.sh k3d test
# Restore sharing, set local affinity and test
./test/mesh-test.sh k3d svc-shared-default
./test/mesh-test.sh k3d svc-affinity-local
./test/mesh-test.sh k3d test
# Delete deployment from cluster 1 and test
./test/mesh-test.sh k3d delete-deployment
./test/mesh-test.sh k3d test
# Delete test
./test/mesh-test.sh k3d delete