Should I make a large function atomic in order to benchmark it accurately?

I would like to know how long it takes to execute some code. The code I am executing deals with openCV matrices and operations. The code will be run in a ROS environment on Linux. I don't want the code to be interrupted by system functions during my benchmarking.

Looking at this post about benchmarking, the answerer said the granularity of the result is 15ms. I would like to do much better than that and so I was considering to make the function atomic (just for benchmarking purposes). I'm not sure if it is a good idea for a few reasons, primarily because I don't have a deep understanding of processor architecture.

void atomic_wrapper_function(const object& A, const object& B) {
  static unsigned long running_sum = 0;
  unsigned long before, after;
  before = GetTimeMs64();
  function_to_benchmark(A, B);
  after = GetTimeMs64();
  running_sum += (after - before);
}

The function I am trying to bench mark is not a short function.

Will the result be accurate? For marking the time I'm considering to use this function by Andreas Bonini.
Will it do something horrible to my computer? Call me superstitious but I think it's good to ask this question.

I'm using C++11 on the Linux Kernel.

Solution

C++11 atomics are not atomic in the RTOS way, they just provide guarantees when writing multithreaded code. Linux is not an RTOS. Your code can and will always be interrupted. There are some ways to lessen the effects though, but not without diving very deeply into linux.

You can for example configure the niceness to get interrupted less by other userspace programs. You can tell the kernel on which CPU core to process interrupts, then pin your program to a different cpu. You can increase the timer precision etc, but:

There are many other things that might change the runtime of your algorithm like several layers of CPU caches, power saving features of your CPU, etc... If you are really only interested in benchmarking the execution time of your function for non-hard realtime problems, it is easier to just run the algorithm many many times and get a statistical estimate for the execution time.

Call the function a billion times during the benchmark and average. OR
Benchmark the function from 1 time to a billion times. The measure for execution time you are interested in should scale linearly. Then do some kind of linear regression to get an estimate of that.

OR: You say that you want to know what influence the algorithm has on your total program runtime? Use profiling tools like callgrind (integratable into QtCreator).