Search code examples
c++c++14benchmarkingcompiler-optimizationmicrobenchmark

How to prevent a segment of Side Effect-Free code from being optimized away?


Consider a scenario where I have constructed a class T that represents large integers. This class T has an addition operator function.

class T {
public:
    T operator+(const T &other) const
}

In order to test its performance, I want to repeatedly execute this function a sufficient number of times, record the total time elapsed, and then calculate the average time. Pseudo code is as follows:

T a, b;
Record the starting timestamp.
for (int i = 0; i < 100000; i++) {
     a + b;
}
Record the ending timestamp.
Calculate the average time.

However, because the result of the a+b operation is not saved, and the operator+ function also does not modify a or b in any way, a + b; is Side Effect-Free code and will be optimized away by the compiler. One feasible solution is to change

 a + b;

to

T t = a + b;

store t somewhere 

However, this would introduce calls to the copy constructor and increase the overhead of storing t, making the timing measurement less accurate. Is there a way to tell the compiler not to optimize away a+b;?


Solution

  • Create a side-effect that requires the result, e.g. GNU C asm volatile("" :: "r"(value)) is volatile so can't be optimized away, and requires the value to exist in a register. But compiles to zero asm instructions.

    OTOH, that would still allow hoisting the a+b computation out of the loop, so you also need to make the compiler forget about the values of a and b every iteration, e.g. with asm("" : "+g"(a), "+g"(b)) to tell the compiler both variables are read/write operands that the asm statement modifies. This might not be compatible with const. Perhaps with a BigInt that only fits in memory, you could take pointers to the locals and use a "memory" clobber to make the compiler assume the data is modified even though you only have const references.

    You'd have to check the asm to make sure that didn't introduce extra store/reloads. (How to remove "noise" from GCC/clang assembly output?). "memory" clobbers don't affect local vars whose address hasn't been taken, but passing the addresses as input operands to an asm statement should make it respect them.

    Various DoNotOptimize implementations use GNU C inline asm that way #ifdef __GNUC__. Like nanobench's doNotOptimizeAway, and Google::Benchmark, see related Q&As: (Google Benchmark Frameworks DoNotOptimize / and another re: the "memory" clobber in it). ISO C++ doesn't have a portable way to do that, so for example with MSVC they might store a value to a volatile int sink dummy variable.

    But introducing a store/reload in a loop-carried dependency chain could be a huge slowdown like a debug build (unlike just storing the same location repeatedly which L1d cache can absorb) so there's no good portable equivalent for the "+g"(var) constraint to make the compiler assume the value has been modified without actually running any instructions.

    "g" lets the compiler pick memory or register at its option. But clang might always prefer memory, so "+r,m" can work around that.