Search code examples
c++linuxubuntudockerreal-time

Docker daemon/container real-time scheduling with Ubuntu (Linux) host


Before I begin, I was at two minds as to whether this question should be raised in SuperUser or Stackoverflow - apologies in advance if it's in the incorrect location.

I have a docker container (contains C/C++ executable code) which performs audio/video processing. As a result, I would like to test the benefits of running the container with RT scheduling constraints. Searching the web, I've come across various bits of information, but I'm struggling to put all the pieces together.

System Environment:

  • Host: Ubuntu (stock) Zesty 17.04 (No RT Kernel patches, Kernel: 4.10.0-35-genric)
  • Docker Version: 17.05.0-ce
  • Docker Images OS: Ubuntu Zesty 17.04.

In an executable nested in the docker image/container, the following code is executed to change the scheduler from 'SCHED_OTHER' to 'SCHED_FIFO' (see docs):

    struct sched_param sched = {};

    const int nMin = sched_get_priority_min(SCHED_FIFO);
    const int nMax = sched_get_priority_max(SCHED_FIFO);

    const int nHlf = (nMax - nMin) / 2;
    const int nPriority = nMin + nHlf + 1;

    sched.sched_priority = boost::algorithm::clamp(nPriority, nMin, nMax);

    if (sched_setscheduler(0, SCHED_FIFO, &sched) < 0)
        std::cerr << "SETSCHEDULER failed - err = " << strerror(errno) << std::endl;
    else
        std::cout << "Priority set to \"" << sched.sched_priority << "\"" << std::endl;

I've been reading varous bits of Docker documentation on using a realtime scheduler. One interesting page states,

Verify that CONFIG_RT_GROUP_SCHED is enabled in the Linux kernel by running zcat /proc/config.gz | grep CONFIG_RT_GROUP_SCHED or by checking for the existence of the file /sys/fs/cgroup/cpu.rt_runtime_us. For guidance on configuring the kernel realtime scheduler, consult the documentation for your operating system.

As per the aforementioned recommendation, the stock Ubuntu Zesty 17.04 OS seems to fail these checks.

First question(s): Cannot I use the RT scheduler? What is 'CONFIG_RT_GROUP_SCHED'? One thing that confuses me is that there are some older posts on the web from 2010-2012 about patching kernels with a RT patch. It seems that there has been some work in the Linux kernel related to soft RT since then.

The quote here has sparked my question:

From kernel version 2.6.18 onward, however, Linux is gradually becoming equipped with real-time capabilities, most of which are derived from the former realtime-preempt patches developed by Ingo Molnar, Thomas Gleixner, Steven Rostedt, and others. Until the patches have been completely merged into the mainline kernel (this is expected to be around kernel version 2.6.30), they must be installed to achieve the best real-time performance. These patches are named:

Carrying on...

Having read additional information, I note that it is important to set ulimits. I've altered /etc/security/limits.conf:

#*               soft    core            0
#root            hard    core            100000
#*               hard    rss             10000

# NEW ADDITION
gavin            hard    rtprio          99

Second question: Presumably the above is required to enable the docker daemon to run RT? It looks as if the daemon is controlled via systemd.

I continued further with my investigation and on the same Docker docs page saw the following snippet:

To run containers using the realtime scheduler, run the Docker daemon with the --cpu-rt-runtime flag set to the maximum number of microseconds reserved for realtime tasks per runtime period. For instance, with the default period of 10000 microseconds (1 second), setting --cpu-rt-runtime=95000 ensures that containers using the realtime scheduler can run for 95000 microseconds for every 10000-microsecond period, leaving at least 5000 microseconds available for non-realtime tasks. To make this configuration permanent on systems which use systemd, see Control and configure Docker with systemd.

Following this page, I discovered there were two parameters to the daemon that were of interest:

  --cpu-rt-period int                     Limit the CPU real-time period in microseconds
  --cpu-rt-runtime int                    Limit the CPU real-time runtime in microseconds

The same page indicates that docker daemon parameters can be specified via '/etc/docker/daemon.json', so I tried:

{
    "cpu-rt-period": 92500,
    "cpu-rt-runtime": 100000
}

Note: The docs do not specify the above options as 'allowed configuration options on Linux'. I thought I would give it a try nonetheless.

Docker daemon output upon restart:

-- Logs begin at Wed 2017-10-04 09:58:38 BST, end at Wed 2017-10-04 10:01:32 BST. --
Oct 04 09:58:47 gavin systemd[1]: Starting Docker Application Container Engine...
Oct 04 09:58:47 gavin dockerd[1501]: time="2017-10-04T09:58:47.885882588+01:00" level=info msg="libcontainerd: new containerd process, pid: 1531"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.053986072+01:00" level=warning msg="failed to rename /var/lib/docker/tmp for background deletion: %!s(<nil>).
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.161303803+01:00" level=info msg="[graphdriver] using prior storage driver: aufs"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.303409053+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304002725+01:00" level=warning msg="Your kernel does not support swap memory limit"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304078792+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304201239+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.305534113+01:00" level=info msg="Loading containers: start."
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.730193030+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemo
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.784938130+01:00" level=info msg="Loading containers: done."
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.888035017+01:00" level=info msg="Daemon has completed initialization"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.888104120+01:00" level=info msg="Docker daemon" commit=89658be graphdriver=aufs version=17.05.0-ce
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.903280645+01:00" level=info msg="API listen on /var/run/docker.sock"
Oct 04 09:58:48 gavin systemd[1]: Started Docker Application Container Engine.

The particular lines of interest:

Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304078792+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304201239+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"

Not surprising given my earlier discoveries.

Final question: When this is finally working, how will I be able to determine that my container is truly running with RT scheduling? Will the likes of 'top' suffice?

EDIT: I ran a kernel diagnostic script which I found through moby on github. This is the output:

warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-4.10.0-35-generic ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: missing
    (cgroup swap accounting is currently not enabled, you can enable it by setting boot option "swapaccount=1")
- CONFIG_LEGACY_VSYSCALL_EMULATE: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled (as module)
  - "ipvlan":
    - CONFIG_IPVLAN: enabled (as module)
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled (as module)
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Line of significance:

- CONFIG_RT_GROUP_SCHED: missing

Solution

  • Container Level

    There are two options to do RT scheduling within a container:

    1. Add the SYS_NICE capability
    docker run --cap-add SYS_NICE ...
    
    1. Use privileged mode with --privileged flag
    docker run --privileged ...
    

    NOTE: --privileged flag grants more permission than necessary!

    The more limited --cap-add SYS_NICE option is much safer.

    OS System Configuration

    You may also have to enable real-time scheduling in your sysctl. If you are running as the root user (default for Docker container):

    sysctl -w kernel.sched_rt_runtime_us=-1
    

    To make that permanent (update your image):

    echo 'kernel.sched_rt_runtime_us=-1' >> /etc/sysctl.conf
    

    https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities