I have an extremely simple program to measure how much time a function is taking.
#include <iostream>
#include <vector>
#include <chrono>
struct Foo
{
void addSample(uint64_t s)
{
}
};
void test(const std::vector<uint64_t>& samples)
{
uint32_t onlyCallTime = 0;
uint32_t loopOnlyTime = 0;
Foo stats;
std::chrono::high_resolution_clock::time_point callStart,callEnd;
auto start = callStart = callEnd = std::chrono::high_resolution_clock::now();
for(auto &s : samples)
{
callStart = std::chrono::high_resolution_clock::now();
loopOnlyTime += std::chrono::duration_cast<std::chrono::microseconds>(callStart-callEnd).count();
stats.addSample(s);
callEnd = std::chrono::high_resolution_clock::now();
onlyCallTime += std::chrono::duration_cast<std::chrono::microseconds>(callEnd-callStart).count();
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "overall duration: " << std::chrono::duration_cast<std::chrono::microseconds>(end-start).count() << std::endl;
std::cout << "only call duration: " << onlyCallTime << std::endl;
std::cout << "only loop duration: " << loopOnlyTime << std::endl;
}
int main()
{
std::vector<uint64_t> dataSetDecreasing;
for(uint32_t i = 0; i < 1000000; ++i)
dataSetDecreasing.push_back(1000000-i);
test(dataSetDecreasing);
}
The output is really confusing. Here are some examples:
overall duration: 56047
only call duration: 195
only loop duration: 285
overall duration: 40984
only call duration: 177
only loop duration: 243
overall duration: 47328
only call duration: 187
only loop duration: 177
How I see it is that callEnd-callStart
captures the call to addSample
+ the duration_cast
.
callStart-callEnd
captures everything else, so the loop initialization, iteration, condition and the second duration_cast
. What am I missing? Where are the other ~40000 microseconds going?
Compiled with g++ -Wall -Wextra -std=c++17 -O3
g++ (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)
My OS is Fedora release 32 (Thirty Two)
1 000 000 iterations took roughly 50 000 microseconds. That is well under 1μs per iteration on average and std::chrono::duration_cast<std::chrono::microseconds>
will round any time less than 1μs to 0. Meaning your loop only counts the iterations that took way longer than average for some reason (scheduling, page faults, caches probably).
Since each measurement has a basic error independent of the duration measured (plus other errors), taking many small measurements and adding them up will be so much less precise than measuring the whole duration once.