Search code examples
c++dockerdocker-stack

100% CPU for Docker Container when started via entrypoint


We have written a small C++ application which mainly does some supervising of other processes via ZeroMQ. So most of the time, the application idles and periodcally sends and receives some requests.

We built a docker image based on ubuntu which contains just this application, some dependencies and an entrypoint.sh. The entrypoint basically runs as /bin/bash, manipulates some configuration files based on environment variables and then starts the application via exec.

Now here's the strange part. When we start the application manually without docker, we get a CPU usage of nearly 0%. When we start the same application as docker image, the CPU usage goes up to 100% and blocks exactly one CPU core.

To find out what was happening, we set the entrypoint of our image to /bin/yes (just to make sure the container keeps running) and then started a bash inside the running container. From there we started entrypoint.sh manually and the CPU again was at 0%.

So we are wondering, what could cause this situation. Is there anything we need to add to our Dockerfile to prevent this?


Here is some output generated with strace. I used strace -p <pid> -f -c and waited five minutes to collect some insights.

1. Running with docker run (100% CPU)

strace: Process 12621 attached with 9 threads
strace: [ Process PID=12621 runs in x32 mode. ]
[...]

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 71.26   17.866443         144    124127           nanosleep
 14.40    3.610578       55547        65        31 futex
 14.07    3.528224        1209      2918           epoll_wait
  0.10    0.024760        4127         6         1 restart_syscall
  0.10    0.024700           0     66479           poll
  0.05    0.011339           4      2902         3 recvfrom
  0.02    0.005517           2      2919           write
  0.01    0.001685           1      2909           read
  0.00    0.000070          70         1         1 connect
  0.00    0.000020          20         1           socket
  0.00    0.000010           1        18           epoll_ctl
  0.00    0.000004           1         6           sendto
  0.00    0.000004           1         4           fcntl
  0.00    0.000000           0         1           close
  0.00    0.000000           0         1           getpeername
  0.00    0.000000           0         1           setsockopt
  0.00    0.000000           0         1           getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00   25.073354                202359        36 total

2. Running with a dummy entrypoint and docker exec (0% CPU)

strace: Process 31394 attached with 9 threads
[...]
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 67.32   12.544007         102    123355           nanosleep
 14.94    2.784310       39216        71        33 futex
 14.01    2.611210         869      3005           epoll_wait
  2.01    0.373797           6     66234           poll
  1.15    0.213487          71      2999           recvfrom
  0.41    0.076113       15223         5         1 restart_syscall
  0.09    0.016295           5      3004           write
  0.08    0.014458           5      3004           read
------ ----------- ----------- --------- --------- ----------------
100.00   18.633677                201677        34 total

Note that in the first case i started strace slightly earlier so there are some different calls which can all be traced back to initialization code.

The only difference I could find is the line Process PID=12621 runs in x32 mode. when using docker run. Could this be an issue?

Also note that in both measurements the total runtime is about 20 seconds while the process was running for five minutes.


Some further investigations on the 100% CPU case. I checked the process with top -H -p <pid> and only the parent process was using 100% CPU while the child threads were all mostly idling. But when calling strace -p <pid> on the parent process I could verify, that the process did not do anything (no output was generated).

So I do have a process which is using one whole core of my CPU doing exactly nothing.


Solution

  • As it turned out some legacy part of the software was waiting for console input in a while loop:

    while (!finished) {
      std::cin >> command;
      processCommand(command)
    }
    

    So this worked fine when running locally and with docker exec. But since the executable was started as a docker service, there was no console present. Therefore std::cin was non-blocking and returned immediately. This way we created an endless loop without any sleeps which naturally caused a 100% CPU usage.

    Thanks to @Botje for guiding us through the debugging process.