New to kubernetes and trying to get an elasticsearch container running. I've got logstash and kibana running fine within the pod, but elasticsearch keeps crashing with a vm.max_map_count issue.
{"@timestamp":"2023-10-06T01:10:53.624Z", "log.level":"ERROR", "message":"node validation exception\n[1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.\nbootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"elk-stack-545f7d5996-bkbtc","elasticsearch.cluster.name":"docker-cluster"}
From other posts, this should be a pretty straight forward fix using an init container. I've set one up that runs fine, but I'm still running into the error. I've tried elevating the elasticsearch container and running the command in it instead of the init container, but when I do that I get a read-only issue. I feel like I must be misunderstanding something and missing what should be an obvious solution.
Init container yaml
initContainers:
- name: max-map-count-setter-elasticsearch
image: busybox:1.28
command: ['sysctl', '-w', 'vm.max_map_count=262144']
securityContext:
privileged: true
elasticsearch container yaml
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.10.2
resources:
requests:
memory: 2Gi
limits:
memory: 2Gi
ports:
- containerPort: 9200
- containerPort: 9300
securityContext:
allowPrivilegeEscalation: true
capabilities:
drop: ["ALL"]
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
Value of vm.max_map_count when using kubectl exec to access the container's terminal
elasticsearch@elk-stack-545f7d5996-jsqsn:~$ sysctl vm.max_map_count
vm.max_map_count = 65530
Running the command directly on the container also gives a read-only error.
The cluster itself is 3 nodes (1 control-plane, 2 workers) running off of talos. It's all virtualized on Proxmox as well, if that makes any difference.
Any help would be appreciated!
Figured it out after a little more searching. The config had to be edited via a machineconfig patch on the Talos hosts the worker nodes were running on.
Edit to Talos machineconfig yaml
sysctls:
vm.max_map_count: 262144
Applied with talosctl -n <IP> apply-config -f <yaml> --talosconfg=<config>
.
The deployment seems to be working fine after that.