Search code examples

Setting vm.max_map_count in init container is not maintained in other containers within pod

New to kubernetes and trying to get an elasticsearch container running. I've got logstash and kibana running fine within the pod, but elasticsearch keeps crashing with a vm.max_map_count issue.

{"@timestamp":"2023-10-06T01:10:53.624Z", "log.level":"ERROR", "message":"node validation exception\n[1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.\nbootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]", "ecs.version": "1.2.0","":"ES_ECS","event.dataset":"elasticsearch.server","":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","":"elk-stack-545f7d5996-bkbtc","":"docker-cluster"}

From other posts, this should be a pretty straight forward fix using an init container. I've set one up that runs fine, but I'm still running into the error. I've tried elevating the elasticsearch container and running the command in it instead of the init container, but when I do that I get a read-only issue. I feel like I must be misunderstanding something and missing what should be an obvious solution.

Init container yaml

      - name: max-map-count-setter-elasticsearch
        image: busybox:1.28
        command: ['sysctl', '-w', 'vm.max_map_count=262144']
          privileged: true

elasticsearch container yaml

      - name: elasticsearch
            memory: 2Gi
            memory: 2Gi
        - containerPort: 9200
        - containerPort: 9300
          allowPrivilegeEscalation: true
            drop: ["ALL"]
          runAsNonRoot: true
            type: RuntimeDefault

Value of vm.max_map_count when using kubectl exec to access the container's terminal

elasticsearch@elk-stack-545f7d5996-jsqsn:~$ sysctl vm.max_map_count
vm.max_map_count = 65530

Running the command directly on the container also gives a read-only error.

The cluster itself is 3 nodes (1 control-plane, 2 workers) running off of talos. It's all virtualized on Proxmox as well, if that makes any difference.

Any help would be appreciated!


  • Figured it out after a little more searching. The config had to be edited via a machineconfig patch on the Talos hosts the worker nodes were running on.

    Edit to Talos machineconfig yaml

      vm.max_map_count: 262144

    Applied with talosctl -n <IP> apply-config -f <yaml> --talosconfg=<config>.

    The deployment seems to be working fine after that.