Why does linux sys fs modification work in plain docker but not under kubernetes?

chrishiestand asked:

The command being run inside the containers is:

echo never | tee /sys/kernel/mm/transparent_hugepage/enabled

Both containers run as privileged. But in the kubernetes docker container the command fails with error:
tee: /sys/kernel/mm/transparent_hugepage/enabled: Read-only file system

and under just plain docker run -it --privileged alpine /bin/sh the command works fine.

I have used docker inspect on both k8s and non-k8s containers to verify privileged status and don’t see anything else listed that should cause this problem – I’ve run diff between both outputs and then used docker run with modifications to try and reproduce the problem in plain docker but failed (it stays working). Any idea why the kubernetes docker container fails and the plain docker container succeeds?

This is reproducible with the pod definition here:

apiVersion: v1
kind: Pod
  name: sys-fs-edit
  - image: alpine
    - /bin/sh
      - -c
      - echo never | tee /sys/kernel/mm/transparent_hugepage/enabled && sysctl -w net.core.somaxconn=8192 vm.overcommit_memory=1 && sleep 9999999d
    imagePullPolicy: Always
    name: sysctl-buddy
      privileged: true

My answer:

The sysctl you’re trying to set applies to the entire host, not to a single container. It is not possible to set it within an unprivileged container, which is why you can’t do it within Kubernetes, but can do so in a privileged Docker container.

If you need this setting to run particular containers, you should set it on the hosts of all nodes in the cluster, not in container or pod definitions.

View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.