Search code examples
perfgoogle-benchmark

What do the 'n', 'p', and 'u' stand for that follow Google Benchmark's perf counter results?


I have a Google Benchmark such as the following.

#include "benchmark/benchmark.h"
#include <cstring>

static void bench_memset(benchmark::State& state) {
    char buffer[16];

    for(auto _ : state) {
        memset(buffer, '\0', 16);
        benchmark::ClobberMemory();
    }
}

BENCHMARK(bench_memset);
BENCHMARK_MAIN();

And I run it with the following command.

./my_benchmark --benchmark_perf_counters=BRANCH-MISSES,CACHE-MISSES,CACHE-REFERENCES --benchmark_counters_tabular=true

Which results in the following data.

---------------------------------------------------------------------------------------------------
Benchmark             Time             CPU   Iterations BRANCH-MISSES CACHE-MISSES CACHE-REFERENCES
---------------------------------------------------------------------------------------------------
bench_memset      0.611 ns        0.611 ns   1000000000            7n        1000p              52n

I generally understand the concept of branch and cache misses, but I don't understand the meaning of the 'n' and 'p' that are printed after the perf counter metrics.

I searched both Google Benchmark's documentation and Perf's documentation, but neither seem to mention this.

I also notice that if I bump up the size of my workload,

static void bench_memset(benchmark::State& state) {
    char buffer[4096];

    for(auto _ : state) {
        memset(buffer, '\0', 4096);
        benchmark::ClobberMemory();
    }
}

Then the 'n' will change to 'u', which is also mysterious.

---------------------------------------------------------------------------------------------------
Benchmark             Time             CPU   Iterations BRANCH-MISSES CACHE-MISSES CACHE-REFERENCES
---------------------------------------------------------------------------------------------------
bench_memset       96.0 ns         96.0 ns      6959398      1.14952u            0         81.6163u

I've also noticed with other benchmarks that there may be no letter at all.

What do these letters stand for?


Solution

  • --benchmark_perf_counters lists additional perf counters to collect, in libpfm format. Thus, the information about u, p and n can be found in the manual of the perfmon2 project. It shows hardware counters, and Benchmarks outputs fractions of an overall test duration:

    • p - pico-, 10−12, 1000p of 1000000000 is 1 cache miss of 1e9 iterations.
    • n - nano-, 10−9, 7n of 1000000000 is 7 branch prediction misses of 1e9 iterations.
    • u - micro-, 10−6, 81.6163u of 6959398 is 568 cache references of 6959398 iterations.

    In other words, these values are probabilities.

    IMHO It would be better if they show just counters, fractions could be easy computed like 7 / 1e9 or 568 / 6959398.