Search code examples
clinuxkernelprocess-migration

Effectively migrate a linux process with C


I need to estimate how much it costs to migrate a linux process on another core of the same computer. To migrate the process I'm using the sched_setaffinity system call, but i've noticed that migration does not always happens instantaneously, which is my requirement.

More in depth, I'm creating a C program which makes lots of simple computation two times each, the first without migration and the second with migration. Computing the difference between the two timestamps should give me a rough estimation of the migration overhead. However, I need to figure out how can I migrate the current process and wait until the migration happens

#define _GNU_SOURCE
#define _POSIX_C_SOURCE 199309L

#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <stdint.h>

//Migrates the process
int migrate(pid_t pid) {
    const int totCPU = 8;
    const int nextCPU = (sched_getcpu() +1) % totCPU;

    cpu_set_t target;
    CPU_SET(nextCPU, &target);
    if(sched_setaffinity(pid, sizeof(target), &target) < 0)
        perror("Setaffinity");

    return nextCPU;
}

int main(void) {
    long i =0;
    const long iterations = 4;
    uint64_t total_sequential_delays = 0;
    uint64_t total_migration_delays = 0;
    uint64_t delta_us;

    for(int i=0; i < iterations; i++) {
        struct timespec start, end;

        //Migration benchmark only happens in odd iterations
        bool do_migration = i % 2 == 1;
        //Start timestamp
        clock_gettime(CLOCK_MONOTONIC_RAW, &start);
        //Target CPU to migrate
        int target;
        if(do_migration) {
            target = migrate(0);
            //if current CPU is not the target CPU
            if(target != sched_getcpu()) {
                do {
                    clock_gettime(CLOCK_MONOTONIC_RAW, &end);
                }
                while(target != sched_getcpu());
            }
        }

        //Simple computation 
        double k = 5;
        for(int j = 1; j <= 9999; j++) {
            k *= j / (k-3);
        }

        //End timestamp
        clock_gettime(CLOCK_MONOTONIC_RAW, &end);

        //Elapsed time
        delta_us = (end.tv_sec - start.tv_sec) * 1000000 + (end.tv_nsec - start.tv_nsec) / 1000;
        if(do_migration) total_migration_delays += delta_us;
        else total_sequential_delays += delta_us;
    }

    //Compute the averages
    double avg_migration = total_migration_delays / iterations;
    double avg_sequential = total_sequential_delays / iterations;

    //Print them
    printf("\navg_migration=%f, avg_sequential=%f",avg_migration,avg_sequential);

    return EXIT_SUCCESS;
}

The problem here is that the do-while loop (rows 46-49) sometimes runs forever.


Solution

  • I need to estimate how much it costs to migrate a linux process on another core of the same computer.

    OK, the cost can be estimated as:

    • the time taken to set the new CPU affinity and do a "yield" or "sleep(0)" to force a task switch/reschedule (including task switch overhead, etc).

    • the cost of a cache miss for every future "was cached on the old CPU but not cached in the new CPU yet" memory access

    • the cost of a TLB miss for every future "virtual to physical translation was cached on the old CPU but not cached in the new CPU yet" memory access

    • NUMA penalties

    • load balancing problems (e.g. migrating from "lightly loaded" CPU or core to "heavily loaded by other processes" CPU or core may cause severe performance problems, including the cost of the kernel deciding to migrate other processes to different CPUs to fix the load balancing, where the costs/overheads paid by other processes probably should be included in the total cost caused by migrating your process).

    Note that:

    a) there are multiple levels of caches (trace cache, instruction cache, L1 data cache, L2 data cache, ..) and some caches are shared between some CPUs (e.g. L1 might be shared between logical CPUs within the same core, L2 might be shared by 2 cores, L3 might be shared by 8 cores).

    b) TLB miss costs depend on a lot of things (e.g. if the kernel is using Meltdown mitigations without the PCID feature and blows away TLBs information every system call anyway).

    c) NUMA penalties are latency costs - every access to RAM (e.g. cache miss) that was allocated on the previous CPU (for the previous NUMA node) will have higher latency than accesses to RAM that are allocated on the new/current CPU (correct NUMA node).

    d) All of the cache miss costs, TLB miss costs and NUMA penalties depend on memory access patterns. A benchmark that has no memory accesses will be misleading.

    e) The cache miss costs, TLB miss costs and NUMA penalties are highly dependent on the hardware involved - e.g. a benchmark on one "slow CPUs with fast RAM and no NUMA" computer will be completely irrelevant for a different "fast CPUs with slow RAM and many NUMA domains" computer. In the same way it's highly dependent on which CPUs (e.g. migrating from CPU #0 to CPU #1 might cost very little, and migrating from CPU #0 to CPU #15 might be very expensive).

    To migrate the process I'm using the sched_setaffinity system call, but i've noticed that migration does not always happens instantaneously, which is my requirement.

    Put a "sleep(0);" after the "sched_setaffinity();".