Search code examples
c++linux-kernelclockc++-chronolow-latency

Is there any high resolution clock (us) in User space (Linux)?


Do you know any C/C++ implementation (even if it is not portable) of a high resolution clock (microseconds minimum), in user space, for Linux?

The goal is to measure the interval elapsed of some low latency operations. I measured that kernel-space clocks causes, some time, latency spikes.

As per my research on Red Hat 7.2:

  • std::chrono::high_resolution_clock max resolution is milliseconds;
  • clock_gettime CLOCK_MONOTONIC and CLOCK_REALTIME are executed through kernel system call;
  • gettimeofday is executed through kernel system call;
  • clock_gettime CLOCK_MONOTONIC_COARSE and CLOCK_REALTIME_COARSE are executed in user-space but the max resolution is milliseconds;

Thanks.


Solution

  • One option is to use rdtsc instruction via __builtin_ia32_rdtsc function. On modern Intel CPUs rdtsc ticks at base clock rate at any CPU frequency, so that you can convert the counter into nanoseconds by dividing the counter by the base (not boost) CPU frequency in GHz:

    #include <regex>
    #include <string>
    #include <fstream>
    #include <iostream>
    
    double cpu_base_frequency() {
        std::regex re("model name\\s*:[^@]+@\\s*([0-9.]+)\\s*GHz");
        std::ifstream cpuinfo("/proc/cpuinfo");
        std::smatch m;
        for(std::string line; getline(cpuinfo, line);) {
            regex_match(line, m, re);
            if(m.size() == 2)
                return std::stod(m[1]);
        }
        return 1; // Couldn't determine the CPU base frequency. Just count TSC ticks.
    }
    
    double const CPU_GHZ_INV = 1 / cpu_base_frequency();
    
    int main() {
        auto t0 = __builtin_ia32_rdtsc();
        auto t1 = __builtin_ia32_rdtsc();
        std::cout << (t1 - t0) * CPU_GHZ_INV << "nsec\n";
    }
    

    Some more info from Intel documentation:

    Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward.

    The invariant TSC will run at a constant rate in all ACPI P-, C- and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource.

    The invariant TSC is based on the invariant timekeeping hardware (called Always Running Timer or ART), that runs at the core crystal clock frequency.

    The scalable bus frequency is encoded in the bit field MSR_PLATFORM_INFO[15:8] and the nominal TSC frequency can be determined by multiplying this number by a bus speed of 100 MHz.