We have written a small C++ application which mainly does some supervising of other processes via ZeroMQ. So most of the time, the application idles and periodcally sends and receives some requests.
We built a docker image based on ubuntu
which contains just this application, some dependencies and an entrypoint.sh
. The entrypoint basically runs as /bin/bash
, manipulates some configuration files based on environment variables and then starts the application via exec
.
Now here's the strange part. When we start the application manually without docker, we get a CPU usage of nearly 0%. When we start the same application as docker image, the CPU usage goes up to 100% and blocks exactly one CPU core.
To find out what was happening, we set the entrypoint of our image to /bin/yes
(just to make sure the container keeps running) and then started a bash inside the running container. From there we started entrypoint.sh
manually and the CPU again was at 0%.
So we are wondering, what could cause this situation. Is there anything we need to add to our Dockerfile to prevent this?
Here is some output generated with strace
. I used strace -p <pid> -f -c
and waited five minutes to collect some insights.
docker run
(100% CPU)strace: Process 12621 attached with 9 threads
strace: [ Process PID=12621 runs in x32 mode. ]
[...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
71.26 17.866443 144 124127 nanosleep
14.40 3.610578 55547 65 31 futex
14.07 3.528224 1209 2918 epoll_wait
0.10 0.024760 4127 6 1 restart_syscall
0.10 0.024700 0 66479 poll
0.05 0.011339 4 2902 3 recvfrom
0.02 0.005517 2 2919 write
0.01 0.001685 1 2909 read
0.00 0.000070 70 1 1 connect
0.00 0.000020 20 1 socket
0.00 0.000010 1 18 epoll_ctl
0.00 0.000004 1 6 sendto
0.00 0.000004 1 4 fcntl
0.00 0.000000 0 1 close
0.00 0.000000 0 1 getpeername
0.00 0.000000 0 1 setsockopt
0.00 0.000000 0 1 getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00 25.073354 202359 36 total
docker exec
(0% CPU)strace: Process 31394 attached with 9 threads
[...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
67.32 12.544007 102 123355 nanosleep
14.94 2.784310 39216 71 33 futex
14.01 2.611210 869 3005 epoll_wait
2.01 0.373797 6 66234 poll
1.15 0.213487 71 2999 recvfrom
0.41 0.076113 15223 5 1 restart_syscall
0.09 0.016295 5 3004 write
0.08 0.014458 5 3004 read
------ ----------- ----------- --------- --------- ----------------
100.00 18.633677 201677 34 total
Note that in the first case i started strace
slightly earlier so there are some different calls which can all be traced back to initialization code.
The only difference I could find is the line Process PID=12621 runs in x32 mode.
when using docker run
. Could this be an issue?
Also note that in both measurements the total runtime is about 20 seconds while the process was running for five minutes.
Some further investigations on the 100% CPU case. I checked the process with top -H -p <pid>
and only the parent process was using 100% CPU while the child threads were all mostly idling. But when calling strace -p <pid>
on the parent process I could verify, that the process did not do anything (no output was generated).
So I do have a process which is using one whole core of my CPU doing exactly nothing.
As it turned out some legacy part of the software was waiting for console input in a while loop:
while (!finished) {
std::cin >> command;
processCommand(command)
}
So this worked fine when running locally and with docker exec
. But since the executable was started as a docker service, there was no console present. Therefore std::cin
was non-blocking and returned immediately. This way we created an endless loop without any sleeps which naturally caused a 100% CPU usage.
Thanks to @Botje for guiding us through the debugging process.