Search code examples
c++performancestaticclangbenchmarking

How does the use of `static` affect the speed of my code?


I was solving an exercise online, and at one point i needed to delete the "" from a the beginning and end of a string. This was my code:

void static inline process_value(std::string &value) {
    if (value.back() !='>') {
        value = value.substr(1, value.size()-2);
    }
}

Called from this benchmark loop:

static void UsingStatic(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
      std::string valor("\"Hola\"");
      process_valueS(valor);
    // Make sure the variable is not optimized away by compiler
    benchmark::DoNotOptimize(valor);
  }
}

Just because of curiosity I did a benchmark.

  • Compiler: Clang-9.0
  • std: c++20
  • optim: O3
  • STL: libstdc++(GNU)

While I was at it I decided to remove static from process_value, making void inline process_value that was otherwise the same. To my surprise it was slower.

I thought that static only meant that the function was just for a file. But here it says that " 'static' means that the function should be inlined by the compiler if possible". But in that case when i removed static I think that the result should not have changed. Now I'm am confused, what other things does static do other than delimiting the function to a single .cpp, how does that affect performance?

The disassembly on QuickBench shows that the NoUsingStatic loop actually calls process_value instead of inlining it, despite the inline keyword making it legal for the compiler to do so. But UsingStatic does inline the call to process_valueS. That difference in compiler decision-making presumably explains the difference in performance, but why would clang choose not to inline a simple function declared void inline process_value(std::string &value){ ... }?


EDIT: Beacuse the question was closed because it was not clear enough, i deleted parts that where not related to the question. But if im missing some information please tell me in the comments


Solution

  • Clang uses a cost based decision whether a function will be inlined or not. This cost is affected by a lot of things. It is affected by static.

    Fortunately, clang has an output, where we can observe this. Check out this godbolt link:

    void call();
    
    inline void a() {
        call();
    }
    
    static inline void b() {
        call();
    }
    
    void foo() {
        a();
        b();
    }
    

    In this little example, a() and b() are the same, the only exception is that b() is static.

    If you move the mouse over the calls a() or b() on godbolt (in OptViewer window), you can read:

    a(): cost=0, threshold=487

    b(): cost=-15000, threshold=487

    (clang will inline a call, if the cost is less than the threshold.)

    clang gave b() a much lower cost, because it is static. It seems that clang will only give this -15000 cost reduction for a static function only once. If b() is called several times, the cost of all b()s will be zero, except one.

    Here are the numbers for your case, link:

    process_value(): cost=400, threshold=325 -> it is just above the threshold, won't be inlined

    process_valueS():: cost=-14600, threshold=325 -> OK to inline

    So, apparently, static can have a lot of impact, if it is only called once. Which makes sense, because inlining a static function once doesn't increase code size.

    Tip: if you want to force clang to inline a function, use __attribute__((always_inline)) on it.