x86 interrupt interrupt-handling microbenchmark intel-pmu

Is there a counter in modern x86 CPUs which only counts the time (or cycles) spent in interrupt handlers?

This is not a duplicate question. It has been claimed that this question is a duplicate of this one. However, I didn't mention "Linux" or "Kernel" (neither in the tags nor in the text). Hence, claiming this being a duplicate of a question which deals with Linux and perf is wrong.

I'd like to know how to measure interrupt times without external programs. In other words, I'd like to do the time measurement in the code myself, ideally using hardware registers. For the sake of this question, let's suppose that there is no O/S.

Having said this:

In an assembler program which runs on a Pentium-M like processor, I would like to measure the time a certain procedure needs for execution. This is usually a no-brainer, there are many articles which state and show how to do that, and I also have my own method which works reliably.

However, in this case, there is a problem: The procedure may be interrupted (by hardware interrupts) at any time. Since I'd like to measure the pure execution time of the procedure itself, things are getting more complicated:

Measure the whole time the procedure has needed (easy)
Measure the time the interrupt handlers have needed while the procedure was running (not that easy)
Subtract the interrupt time from the whole time to get the figure I'm interested in

I always thought that on "modern" Intel PC CPUs there is a counter which only counts up while the CPU executes an interrupt handler. But that doesn't seem to be the case. At least, I haven't found it in the "Performance Monitoring" Chapter of the Intel 64 and IA-32 Architectures Software Developer's manual.

I have worked out a solution which fits my needs for the moment, but is not as precise as I'd like it to be for future cases, and it is not very elegant.

Therefore, I'd like to know whether I have missed a hardware counter which could help me by counting only while executing an interrupt handler (or alternatively, counting only when executing code which is not in an interrupt handler).

Disabling interrupts to measure the pure procedure execution time is not an option, because the things which happen in the interrupt handlers may have effects on the execution of the procedure.

The procedure and the interrupt handlers are running on the same core.

The whole code (procedure and interrupt handlers) is running in ring 0.

Solution

No, there isn't hardware support for this, only for programming a counter to count in ring 0 (kernel mode) vs. ring 3 (user space). That's what Linux perf uses to implement perf stat --all-user or --all-kernel, or the cycles:u or :k modifiers. (I'm not sure which one ring 1 and ring 2 get lumped in with).

The x86 ISA doesn't distinguish the state of being in an "interrupt handler" as special. That's merely a software notion, e.g. an interrupt handler in a mainstream kernel might end by jumping to a function called schedule() to decide whether to return to the task that was interrupted, or to some other task. There might eventually be an iret (interrupt-return), but that's not "special" beyond popping CS:RIP, RSP, and RFLAGS from the current stack, which might be hard to emulate with other instructions.

But if a kernel context-switches to a task that had previously made a blocking system call, it might return to user-space via sysret, only running an iret much later after context-switching back to a task that got interrupted. You don't need to do anything special to tell an x86 CPU you've finished an interrupt handler (unlike some other ISAs perhaps), so there's nothing the CPU could even watch for.

The APIC (interrupt controller for external interrupts) may need to get poked to let it know that we're ready to handle further interrupts of this type, but the CPU core itself probably doesn't keep track of this.

So there are a few different heuristics one could imagine hypothetical x86 hardware using to tell when an interrupt handler had finished, but I don't think actual x86 hardware PMUs do any of them.

For the normal case of profiling code that runs in user-space (and doesn't make system calls), perf stat --all-user (or manually programming the PMU settings it would use) would do exactly what you want, not counting anything while the CPU is in kernel mode. The only kernel time would be in interrupt handlers.

But for your case, where the code you want to profile is running in ring 0, the HW can't help you.

Unless you do extremely time-consuming things in your interrupt handlers (compared to the amount of other work), it's probably good enough to just let them get counted. At least for events like "cycles". If interrupt handlers cause a lot of TLB misses or cache misses or something, that might throw off your counts for other events.

Your interrupt handlers could run rdpmc at the start/end and maybe sum up the counts for each event into some global (or core-local) variables, so you'd have something to subtract from your main counts. But that would add the overhead of multiple rdpmc instructions to each interrupt handler.