Search code examples
c++c++20likely-unlikely

Why isn’t my code with C++20 likely/unlikely attributes faster?


Code ran on Visual Studio 2019 Version 16.11.8 with /O2 Optimization and Intel CPU. I am trying to find the root cause for this counter-intuitive result I get that without attributes is statistically faster than with attributes via t-test. I am not sure what is the root cause for this. Could it be some sort of cache? Or some magic the compiler is doing - I cannot really read assembly

     #include <chrono>
     #include <iomanip>
     #include <iostream>
     #include <numeric>
     #include <random>
     #include <vector>
     #include <cmath>
     #include <functional>
    
    static const size_t NUM_EXPERIMENTS = 1000;
    
    double calc_mean(std::vector<double>& vec) {
        double sum = 0;
        for (auto& x : vec)
            sum += x;
        return sum / vec.size();
    }
    
    double calc_deviation(std::vector<double>& vec) {
        double sum = 0;
        for (int i = 0; i < vec.size(); i++)
            sum = sum + (vec[i] - calc_mean(vec)) * (vec[i] - calc_mean(vec));
        return sqrt(sum / (vec.size()));
    }
    
    double calc_ttest(std::vector<double> vec1, std::vector<double> vec2){
        double mean1 = calc_mean(vec1);
        double mean2 = calc_mean(vec2);
        double sd1 = calc_deviation(vec1);
        double sd2 = calc_deviation(vec2);
        double t_test = (mean1 - mean2) / sqrt((sd1 * sd1) / vec1.size() + (sd2 * sd2) / vec2.size());
        return t_test;
    }
    
    namespace with_attributes {
        double calc(double x) noexcept {
            if (x > 2) [[unlikely]]
                return sqrt(x);
            else [[likely]]
                return pow(x, 2);
        }
    }  // namespace with_attributes
    
    
    namespace no_attributes {
        double calc(double x) noexcept {
            if (x > 2)
                return sqrt(x);
            else
                return pow(x, 2);
        }
    }  // namespace with_attributes
    
    std::vector<double> benchmark(std::function<double(double)> calc_func) {
        std::vector<double> vec;
        vec.reserve(NUM_EXPERIMENTS);
    
        std::mt19937 mersenne_engine(12);
        std::uniform_real_distribution<double> dist{ 1, 2.2 };
    
        for (size_t i = 0; i < NUM_EXPERIMENTS; i++) {
    
            const auto start = std::chrono::high_resolution_clock::now();
            for (auto size{ 1ULL }; size != 100000ULL; ++size) {
                double x = dist(mersenne_engine);
                calc_func(x);
            }
            const std::chrono::duration<double> diff =
                std::chrono::high_resolution_clock::now() - start;
            vec.push_back(diff.count());
        }
        return vec;
    }
    
    int main() {
    
        std::vector<double> vec1 = benchmark(with_attributes::calc);
        std::vector<double> vec2 = benchmark(no_attributes::calc);
        std::cout << "with attribute: " << std::fixed << std::setprecision(6) << calc_mean(vec1) << '\n';
        std::cout << "without attribute: " << std::fixed << std::setprecision(6) << calc_mean(vec2) << '\n';
        std::cout << "T statistics" << std::fixed << std::setprecision(6) << calc_ttest(vec1, vec2) << '\n';
    }

Solution

  • Per godbolt, the two functions generates identical assembly under msvc

            movsd   xmm1, QWORD PTR __real@4000000000000000
            comisd  xmm0, xmm1
            jbe     SHORT $LN2@calc
            xorps   xmm1, xmm1
            ucomisd xmm1, xmm0
            ja      SHORT $LN7@calc
            sqrtsd  xmm0, xmm0
            ret     0
    $LN7@calc:
            jmp     sqrt
    $LN2@calc:
            jmp     pow
    

    Since msvc is not open source, one could only guess why msvc would choose to ignore this optimization -- maybe because two branches are all function calls (it's tail call so jmp instead of call) and that's too costly for [[likely]] to make a difference. But if clang is used, it's smart enough to optimize power 2 into x * x, so different code would be generated. Following that lead, if your code is modified into

            double calc(double x) noexcept {
                if (x > 2)
                    return x + 1;
                else
                    return x - 2;
            }
    

    msvc would also output different layout.