I am running Docker containers containing JVM (java8u31). These containers are deployed as pods in a kubernetes cluster. Often I get OOM for the pods and Kubernetes kills the pods and restarts it. I am having issues in finding the root cause for these OOMs.
Here are the JVM parameters
-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Xms700M -Xmx1000M -XX:MaxRAM=1536M -XX:MaxMetaspaceSize=250M
These containers are deployed as stateful set and following is the resource allocation
resources:
requests:
memory: "1.5G"
cpu: 1
limits:
memory: "1.5G"
cpu: 1
so the total memory allocated to the container matches the MaxRam
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/etc/opt/jmx/java_pid%p.hprof
that doesn't help because the pod is getting killed and recreated and started as soon as there is a OOM so everything within the pod is lostThe only way to get a thread or HEAP dump is to SSH into the pod which also I am not able to take because the pod is recreated after an OOM so I don't get the memory footprint at the time of OOM. I SSH after an OOM which is not much help.
Any help is appreciated to resolve the OOM thrown by Kubernetes.
Thanks @VAS for your comments. Thanks for the kubernetes links.
After few tests I think that its not a good idea to specify XMX if you are using -XX:+UseCGroupMemoryLimitForHeap since XMX overrides it. I am still doing some more tests & profiling.
Since my requirement is running a JVM inside a docker container. I did few tests as mentioned in the posts by @Eugene. Considering every app running inside a JVM would need HEAP and some native memory, I think we need to specify -XX:+UnlockExperimentalVMOptions, XX:+UseCGroupMemoryLimitForHeap, -XX:MaxRAMFraction=1 (considering only the JVM running inside the container, at the same time its risky) -XX:MaxRAM (I think we should specify this if MaxRAMFraction is 1 so that you leave some for native memory)
Few tests:
As per below docker configuration, the docker is allocated 1 GB considering you only have the JVM running inside the container. Considering docker's allocation to 1G and I also want to allocate some to the process/native memory, I think I should use MaxRam=700M so that I have 300 MB for native.
$ docker run -m 1GB openjdk:8u131 java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1 -XX:MaxRAM=700M -XshowSettings:vm -version VM settings: Max. Heap Size (Estimated): 622.50M Ergonomics Machine Class: server Using VM: OpenJDK 64-Bit Server VM
Now specifying XX:MaxRAMFraction=1 might be killing:
references: https://twitter.com/csanchez/status/940228501222936576?lang=en Is -XX:MaxRAMFraction=1 safe for production in a containered environment?
Following would be better, please note I have removed MaxRAM since MaxRAMFraction > 1 :
$ docker run -m 1GB openjdk:8u131 java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -XshowSettings:vm -version VM settings: Max. Heap Size (Estimated): 455.50M Ergonomics Machine Class: server Using VM: OpenJDK 64-Bit Server VM
This gives rest of the 500M for native e.g. could be used for MetaSpace by specifying -XX:MaxMetaspaceSize:
$ docker run -m 1GB openjdk:8u131 java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -XX:MaxMetaspaceSize=200M -XshowSettings:vm -version VM settings: Max. Heap Size (Estimated): 455.50M Ergonomics Machine Class: server Using VM: OpenJDK 64-Bit Server VM
Logically and also as per the above references, it makes sense to specify -XX:MaxRAMFraction >1. This also depends on the application profiling done.
I am still doing some more tests, will update these results or post. Thanks