Search code examples
linux-kernelschedulercgroups

How does the CFS Scheduler select which task group/cgroup to run next?


This post gave a very nice summary of how scheduling is done on a per task basis, but it doesn't really go into how tasks groups are ordered on the RB tree.

I couldn't really find other resources that spoke about this either; all of them said something along the lines of:

The pick_next_task_fair() function keeps picking the left most scheduling entity as long as the current scheduling entity is a CFS RQ (RB Tree with leftmost node having smallest vruntime).

But how exactly does CFS prioritise one task group over another task group on the RB Tree? Is it done on the basis of the min_vruntime of the tasks inside it? Is it done based on the CPU shares given to that task group?

Any insights on this would be much appreciated! Thanks in advance!


Solution

  • But how exactly do you place a task group on the RB Tree?

    Take pick_next_task_fair() for instance, task groups are inserted into RB Tree via following codes. More specifically, put_prev_entity().

    if (prev != p) {
            struct sched_entity *pse = &prev->se;
    
            while (!(cfs_rq = is_same_group(se, pse))) {
                    int se_depth = se->depth;
                    int pse_depth = pse->depth;
    
                    if (se_depth <= pse_depth) {
                            put_prev_entity(cfs_rq_of(pse), pse);
                            pse = parent_entity(pse);
                    }
                    if (se_depth >= pse_depth) {
                            set_next_entity(cfs_rq_of(se), se);
                            se = parent_entity(se);
                    }
            }
    
            put_prev_entity(cfs_rq, pse);
            set_next_entity(cfs_rq, se);
    }
    

    Is it done on the basis of the min_vruntime of the tasks inside it? Is it done based on the CPU shares given to that task group?

    Not the whole CPU shares given to that task group. Only a fragment1 of that shares. See calc_group_shares() and update_curr() for more details.