Is there an x86 instruction to get the current time?
Basically... something like a replacement for clock_get_time ... something with the minimum overhead... where I don't really care about getting the time in any specific format... as long as it's a format I can use.
Basically I'm doing some work to "Detect how much PHYSICAL REAL LIFE TIME" has gone by... and I want to be able to measure time as frequently as possible!
I guess you can imagine i'm doing something like a profiling app... :)
I really need aggressively efficient access to the hardware time. So ideally... some ASM to get the time... store it somewhere... then massage it later into some format that I can actually process.
I'm not interested in _rdtsc as that measures the number of cycles gone by. I need to know how much physical time has executed... not cycles which can vary due to thermal fluctations or so..
For profiling, often it's most useful to profile in terms of CPU clock cycles, rather than wall-clock time. CPU dynamic clocking (turbo and power saving) makes it annoying to get the CPU ramped up to full speed before the start of a measurement period.
If you still need wall-clock time after that:
Recent x86 CPUs have a TSC that runs at a fixed rate, regardless of CPU frequency adjustment for power-saving. Also, the TSC doesn't stop when the CPU is halted. (i.e. no work to do, so it ran the HLT
instruction to wait for an interrupt in low-power mode.)
It turned out that efficient access to a useful time-source was more useful to have in hardware than an actual clock cycle counter, so that's what RDTSC
morphed into, a few CPU generations after its introduction. Now we're back to using hardware performance counters for measuring clock cycles.
In Linux, look for constant_tsc
and nonstop_tsc
in the CPU features flags in /proc/cpuinfo
. IDK if there are CPUID
bits for those. If no, use Linux's code for it (if you can use GPLed code).
On a CPU with those two key features, Linux uses the TSC as its clocksource, IIRC.
The lowest overhead way to get the current time in user-space will be to work out the conversion between RDTSC
ticks and real time. While profiling, you might just store 64bit TSC snapshots, and convert to real-time later. (So you can handle TSC wraparound then). RDTSC
only takes about 24 cycles (Agner Fog's instruction table, Intel Haswell). I think the overhead of a system call will be an order of magnitude higher than that. (The kernel will have to do a RDTSC
in there somewhere anyway).
Agner Fog has documented his profiling / timing methods, and has some example code. I haven't looked recently, but it might have useful stuff for this application.