Search code examples
linuxbashsignalssudocgroups

Cgroup unexpectedly propagates SIGSTOP to the parent


I have a small script to run a command inside a cgroup that limits CPU time:

$ cat cgrun.sh
#!/bin/bash

if [[ $# -lt 1 ]]; then
    echo "Usage: $0 <bin>"
    exit 1
fi

sudo cgcreate -g cpu:/cpulimit
sudo cgset -r cpu.cfs_period_us=1000000 cpulimit
sudo cgset -r cpu.cfs_quota_us=100000 cpulimit
sudo cgexec -g cpu:cpulimit sudo -u $USER "$@"
sudo cgdelete cpu:/cpulimit

I let the command run: ./cgrun.sh /bin/sleep 10

Then I send SIGSTOP to the sleep command from another terminal. Somehow at this moment the parent commands, sudo and cgexec receive this signal as well. Then, I send SIGCONT to the sleep command, which allows sleep to continue.

But at this moment sudo and cgexec are stopped and never reap the zombie of the sleep process. I don't understand how this can happen? And how can I prevent it? Moreover, I cannot send SIGCONT to sudo and cgexec, because I'm sending the signals from user, while these commands run as root.

Here is how it looks in htop (some columns omitted):

    PID USER S CPU% MEM%   TIME+  Command
1222869 user S  0.0  0.0  0:00.00 │     │  └─ /bin/bash ./cgrun.sh /bin/sleep 10
1222882 root T  0.0  0.0  0:00.00 │     │     └─ sudo cgexec -g cpu:cpulimit sudo -u user /bin/sleep 10
1222884 root T  0.0  0.0  0:00.00 │     │        └─ sudo -u desertfox /bin/sleep 10
1222887 user Z  0.0  0.0  0:00.00 │     │           └─ /bin/sleep 10

How can create a cgroup in a way that SIGSTOP is not bounced to parent processes?

UPD

If I start the process using systemd-run, I do not observe the same behavior:

sudo systemd-run --uid=$USER -t -p CPUQuota=10% sleep 10

Solution

  • Instead of using the "cg tools", I would do it the "hard way" with the shell commands to create the cpulimit cgroup (it is a mkdir), set the cfs parameters (with echo command in the corresponding cpu.cfs_* files), create a sub-shell with the (...) notation, move it into the cgroup (echo command of its pid into the tasks file of the cgroup) and execute the requested command in this subshell.

    Hence, cgrun.sh would look like this:

    #!/bin/bash
    
    if [[ $# -lt 1 ]]; then
        echo "Usage: $0 <bin>" >&2
        exit 1
    fi
    
    CGTREE=/sys/fs/cgroup/cpu
    
    sudo -s <<EOF
    [ ! -d ${CGTREE}/cpulimit ] && mkdir ${CGTREE}/cpulimit
    echo 1000000 > ${CGTREE}/cpulimit/cpu.cfs_period_us
    echo 100000 > ${CGTREE}/cpulimit/cpu.cfs_quota_us
    EOF
    
    # Sub-shell in background
    (
      # Pid of the current sub-shell
      # ($$ would return the pid of the father process)
      MY_PID=$BASHPID
    
      # Move current process into the cgroup
      sudo sh -c "echo ${MY_PID} > ${CGTREE}/cpulimit/tasks"
    
      # Run the command with calling user id (it inherits the cgroup)
      exec "$@"
    
    ) &
    
    # Wait for the sub-shell
    wait $!
    
    # Exit code of the sub-shell
    rc=$?
    
    # Delete the cgroup
    sudo rmdir ${CGTREE}/cpulimit
    
    # Exit with the return code of the sub-shell
    exit $rc
    

    Run it (before we get the pid of the current shell to display the process hierarchy in another terminal):

    $ echo $$
    112588
    $ ./cgrun.sh /bin/sleep 50
    

    This creates the following process hierarchy:

    $ pstree -p 112588
    bash(112588)-+-cgrun.sh(113079)---sleep(113086)
    

    Stop the sleep process:

    $ kill -STOP 113086
    

    Look at the cgroup to verify that sleep command is running into it (its pid is in the tasks file) and the CFS parameters are correctly set:

    $ ls -l /sys/fs/cgroup/cpu/cpulimit/
    total 0
    -rw-r--r-- 1 root root 0 nov.    5 22:38 cgroup.clone_children
    -rw-r--r-- 1 root root 0 nov.    5 22:38 cgroup.procs
    -rw-r--r-- 1 root root 0 nov.    5 22:36 cpu.cfs_period_us
    -rw-r--r-- 1 root root 0 nov.    5 22:36 cpu.cfs_quota_us
    -rw-r--r-- 1 root root 0 nov.    5 22:38 cpu.shares
    -r--r--r-- 1 root root 0 nov.    5 22:38 cpu.stat
    -rw-r--r-- 1 root root 0 nov.    5 22:38 cpu.uclamp.max
    -rw-r--r-- 1 root root 0 nov.    5 22:38 cpu.uclamp.min
    -r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.stat
    -rw-r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage
    -r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_all
    -r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_percpu
    -r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_percpu_sys
    -r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_percpu_user
    -r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_sys
    -r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_user
    -rw-r--r-- 1 root root 0 nov.    5 22:38 notify_on_release
    -rw-r--r-- 1 root root 0 nov.    5 22:36 tasks
    $ cat /sys/fs/cgroup/cpu/cpulimit/tasks 
    113086  # This is the pid of sleep
    $ cat /sys/fs/cgroup/cpu/cpulimit/cpu.cfs_*
    1000000
    100000
    

    Send SIGCONT signal to the sleep process:

    $ kill -CONT 113086
    

    The process finishes and the cgroup is destroyed:

    $ ls -l /sys/fs/cgroup/cpu/cpulimit
    ls: cannot access '/sys/fs/cgroup/cpu/cpulimit': No such file or directory
    

    Get the exit code of the script once it is finished (it is the exit code of the launched command):

    $ echo $?
    0