Search code examples
javadockerjvmcgroups

Java (prior to JDK8 update 131) applications running in docker container CPU / Memory issues?


JVM's (JDK 8 before Update 131) running in docker containers were ignoring the CGroup limitations set by the container environment. And, they were querying for host resources and not what was allocated to the container. The result is catastrophic for the JVM i.e As the JVM was trying to allocate itself more resources (CPU or Memory) than what is permitted through CGroup limits, docker demon would notice this and kill the JVM process or the container itself if the java program was running with pid 1.

Solution for memory issue - (possibly fixed in JDK 8 update 131) Like described above, JVM was allocating it's self more memory than what's allowed for the container. This could be easily fixed by

  1. explicitly setting the max heap memory limit (using -Xmx ) while starting the JVM. ( prior to 131 update)
  2. or by passing these flags - (after 131 update)
    -XX:+UnlockExperimentalVMOptions and
    -XX:+UseCGroupMemoryLimitForHeap

Resolving the CPU issue (possibly fixed in JDK update 212 ) Again like described above, JVM running in docker would look at the host hardware directly and obtain the total CPUs available. Then it would try to access or optimize based on this CPU counts.

  1. After JDK 8 update 212, any JVM running in docker container will respect the cpu limits allocated to container and not look into host cpus directly. If a container with cpu limitation is started as below, JVM will respect this limitation and restrict itself to 1 cpu.
    docker run -ti --cpus 1 -m 1G openjdk:8u212-jdk //jvms running in this container are restricted to 1cpu.
  2. HERE IS MY QUESTION: The CPU issue is probabily fixed in JDK8 Update 212, but what if I can not update my JVM and I am running version prior to update 131 , how can I fix the cpu issue.

Solution

  • Linux container support first appeared in JDK 10 and then ported to 8u191, see JDK-8146115.

    Earlier versions of the JVM obtained the number of available CPUs as following.

    • Prior to 8u121, HotSpot JVM relied on sysconf(_SC_NPROCESSORS_ONLN) libc call. In turn, glibc read the system file /sys/devices/system/cpu/online. Therefore, in order to fake the number of available CPUs, one could replace this file using a bind mount:

      echo 0-3 > /tmp/online
      docker run --cpus 4 -v /tmp/online:/sys/devices/system/cpu/online ...
      

      To set only one CPU, write echo 0 instead of echo 0-3

    • Since 8u121 the JVM became taskset aware. Instead of sysconf, it started calling sched_getaffinity to find the CPU affinity mask for the process.

      This broke bind mount trick. Unfortunately, you can't fake sched_getaffinity the same way as sysconf. However, it is possible to replace libc implementation of sched_getaffinity using LD_PRELOAD.

    I wrote a small shared library proccount that replaces both sysconf and sched_getaffinity. So, this library can be used to set the right number of available CPUs in all JDK versions before 8u191.

    How it works

    1. First, it reads cpu.cfs_quota_us and cpu.cfs_period_us to find if the container is launched with --cpus option. If both are above zero, the number of CPUs is estimated as

      cpu.cfs_quota_us / cpu.cfs_period_us
      
    2. Otherwise it reads cpu.shares and estimates the number of available CPUs as

      cpu.shares / 1024
      

      Such CPU calculation is similar to how it actually works in a modern container-aware JDK.

    3. The library defines (overrides) sysconf and sched_getaffinity functions to return the number of processors obtained in (1) or (2).

    How to compile

    gcc -O2 -fPIC -shared -olibproccount.so proccount.c -ldl
    

    How to use

    LD_PRELOAD=/path/to/libproccount.so java <args>