Search code examples
linuxkernelpreemptionkprobe

Why do kprobes disable preemption and when is it safe to reenable it?


According to the docs, kprobes disable preemption:

Probe handlers are run with preemption disabled. Depending on the architecture and optimization state, handlers may also run with interrupts disabled (e.g., kretprobe handlers and optimized kprobe handlers run without interrupt disabled on x86/x86-64).

From commit 9a09f261a we can clearly see that optimized kprobes used to run with preemption enabled.

Why is that the case? I understand kprobes as a way to inject some code at a specific address in the kernel and with that understanding any code should be ok.

  • What makes kprobes special such that preemption has to be disabled?
  • In what circumstances can I re-enable preemption?

Solution

  • At least on x86, the implementation of Kprobes relies on the fact that preemption is disabled while the Kprobe handlers run.

    When you place an ordinary (not Ftrace-based) Kprobe on an instruction, the first byte of that instruction is overwritten with 0xcc (int3, "software breakpoint"). If the kernel tries to execute that instruction, a trap occurs and kprobe_int3_handler() is called (see the implementation of do_int3()).

    To call your Kprobe handlers, kprobe_int3_handler() finds which Kprobe hit, saves it as percpu variable current_kprobe and calls your pre-handler. After that, it prepares everything to single-step over the original instruction. After the single-stepping, your post-handler is called and then some cleanup is performed. current_kprobe and some other per-cpu data are used to do all this. Preemption is only enabled after that.

    Now, imagine the pre-handler has enabled preemption, was preempted right away and resumed on a different CPU. If the implementation of Kprobes tried to access current_kprobe or other per-cpu data, the kernel would likely crash (NULL pointer deref if there were no current_kprobe on that CPU at the moment) or worse.

    Or, the preempted handler could resume on the same CPU but another Kprobe could hit there while it was sleeping - current_kprobe, etc. would be overwritten and disaster would be very likely.

    Re-enabling preemption in Kprobe handlers could result in difficult-to-debug kernel crashes and other problems.

    So, in short, this is because Kprobes are designed this way, at least on x86. I cannot say much about their implementation on other architectures.


    Depending on what you are trying to accomplish, other kernel facilities might be helpful.

    For instance, if you only need to run your code at the start of some functions, take a look at Ftrace. Your code would then run in the same conditions as the functions you hook it into.


    All that being said, it was actually needed in one of my projects to use Kprobes so that the handlers were running in the same conditions w.r.t. preemption as the probed instructions. You can find the implementation here. However, it had to jump through the hoops to achieve that without breaking anything. It has been working OK so far but it is more complex than I would like, has portability issues too.