Search code examples
linuxdockergocgroups

The task is removed from cgroup after the exit


Here is my test code. I want to write current PID to the tasks file:

package subsystems

import (
  "io/ioutil"
  "os"
  "path"
  "strconv"
  "testing"
)

func TestSubsystems(t *testing.T) {
  p := "/sys/fs/cgroup/memory/test1"
  f := "tasks"

  err := os.MkdirAll(p, 0644)

  if err != nil {
    t.Log(err.Error())
  }

  pid := os.Getpid()

  if err := ioutil.WriteFile(path.Join(p, f), []byte(strconv.Itoa(pid)), 0644); err != nil {
    t.Failed()
  }
}

But when the program exits with code 0, I cannot see anything in tasks:

root@ubuntu:/sys/fs/cgroup/memory/test1# cat tasks
root@ubuntu:/sys/fs/cgroup/memory/test1# cat tasks
root@ubuntu:/sys/fs/cgroup/memory/test1# 

How to solve this problem?


Solution

  • What you've described is not a problem at all — it is a correct behavior:

    • you create an empty cgroup;
    • you add a task to it;
    • the task runs in the cgroup for a while;
    • the task exits;
    • the task gets removed from the cgroup due to the previous item;
    • the cgroup is empty again;
    • you observe the list of cgroup members and see nothing.

    Why this is a correct behavior? Well, the most trivial answer is that there are no reasons for keeping non-existent PID in the list of cgroup members. On the other hand, there are numerous reasons for not doing this.

    PIDs reuse is one of these reasons which comes to mind first: if some PID does not get removed from the cgroup after the death or the graceful shutdown of the task, any other task which will later reuse this PID (assuming that the cgroup is still alive) will be a member of this group, which, obviously, is not a desired behavior, especially when we speak about containers (e.g. such a behavior may be a reason of privileges gain if we speak about the devices cgroup).

    As I previously mentioned in the comments, I can't find an explicit description of this behavior in the documentation — probably that's because it feels to be too obvious, but the source code is self-explanatory: the exit(2) system call executes cgroup_exit() under the hood, and this function moves the task out of its cgroups.