Self-modifying code for trace hooks?

I'm looking for the least-overhead way of inserting trace/logging hooks into some very performance-sensitive driver code. This logging stuff has to always be compiled in, but most of the time do nothing (but do nothing very fast).

There isn't anything much simpler than just having a global on/off word, doing an if(enabled){log()}. However, if possible I'd like to even avoid the cost of loading that word every time I hit one of my hooks. It occurs to me that I could potentially use self-modifying code for this -- i.e. everywhere I have a call to my trace function, I overwrite the jump with a NOP when I want to disable the hooks, and replace the jump when I want to enable them.

A quick google doesn't turn up any prior art on this -- has anyone done it? Is it feasible, are there any major stumbling blocks that I'm not foreseeing?

(Linux, x86_64)

Solution

Yes, this technique has been implemented within the Linux kernel, for exactly the same purpose (tracing hooks).

See the LWN article on Jump Labels for a starting point.

There's not really any major stumbling blocks, but a few minor ones: multithreaded processes (you will have to stop all other threads while you're enabling or disabling the code); incoherent instruction cache (you'll need to ensure the I-cache is flushed, on every core).