Search code examples

Can preemptive multitasking of native code be implemented in user space on Linux?

I'm wondering if it's possible to implement preemptive multitasking of native code within a single process in user space on Linux. (That is, externally pause some running native code, save the context, swap in a different context, and resume execution, all orchestrated by user space but using calls that may enter the kernel.) I was thinking this could be done using a signal handler for SIGALRM, and the *context() family but it turns out that the entire *context() family is async-signal-unsafe so that approach isn't guaranteed to work. I did find a gist that implements this idea so apparently it does happen to work on Linux, at least sometimes, even though by POSIX it's not required to work. The gist installs this as a signal handler on SIGALRM, which makes several *context() calls:

timer_interrupt(int j, siginfo_t *si, void *old_context)
    /* Create new scheduler context */
    signal_context.uc_stack.ss_sp = signal_stack;
    signal_context.uc_stack.ss_size = STACKSIZE;
    signal_context.uc_stack.ss_flags = 0;
    makecontext(&signal_context, scheduler, 1);

    /* save running thread, jump to scheduler */

Does Linux offer any guarantee that makes this approach correct? Is there a way to make this correct? Is there a totally different way to do this correctly?

(By "implement in user space" I don't mean that we never enter the kernel. I mean to contrast with the preemptive multitasking implemented by the kernel.)


  • You cannot reliably change contexts inside signal handlers. (if you did that from some signal handler, it would usually work in practice, but not always, hence it is undefined behavior).

    You could set some volatile sig_atomic_t flag (read about sig_atomic_t) in a signal handler (see signal(7), signal-safety(7), sigreturn(2) ...) and check that flag regularly (e.g. at least once every few milliseconds) in your code, for example before most calls, or inside your event loop if you have one, etc... So it becomes cooperative user-land scheduling.

    It is easier to do if you can change the code, e.g. when you design some compiler which emits C code (a common practice), or if you hack your C compiler to emit such tests. Then you'll change your code generator to sometimes emit such a test in the generated code.

    You may want to forbid blocking system calls and replace them with non-blocking variants or wrappers. See also poll(2), fcntl(2) with F_SETFL and O_NONBLOCK, etc...

    You may want the code generator to avoid large call stacks, e.g. like GCC's -fsplit-stack instrumentation option does (read about splitstacks in GCC).

    And if you generate (or write some) assembler, you can use such tricks. AFAIK the Go compiler uses something similar for its goroutines. Study your ABI, e.g. from here.

    However, kernel initiated preemptive scheduling is preferable (and on Linux will still happen between processes or kernel tasks, see clone(2)).

    PS. If garbage collection techniques using similar tricks interest you, look into MPS and Cheney on the MTA (e.g. into Chicken Scheme).