Search code examples
c++multithreadingmacosc++11

C++ : MultiThreading : Parallelism is taking more time than Sequential


I am using threading in c++, on MAC for the first time.

Below is my code. (Motivation behind my code)

#include <iostream>
#include <thread>
#include <chrono>

using namespace std;

using namespace chrono;

const long long maxLimit = 2e9;

void getEvenSum(long long &sum) {
    for(int i = 0 ; i <= maxLimit ; i+=2) {
        sum += i;
    }
}

void getOddSum(long long &sum) {
    for(int i = 1 ; i <= maxLimit ; i+=2) {
        sum += i;
    }
}

int main() {
    
    auto startTime = high_resolution_clock::now();
    
    long long evenSum = 0 , oddSum = 0;
    thread evenSumThread(getEvenSum, ref(evenSum));
    thread oddSumThread(getOddSum, ref(oddSum));

    evenSumThread.join();
    oddSumThread.join();

    auto endTime = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(endTime - startTime);

    cout << " final sum is " << evenSum << " " << oddSum << endl;

    cout << " time taken with thread :" << duration.count() / (long double)1e6 << endl;

    

    startTime = high_resolution_clock::now();
    
    evenSum = 0 , oddSum = 0;
    getEvenSum(evenSum);
    getOddSum(oddSum);

    endTime = high_resolution_clock::now();
    duration = duration_cast<microseconds>(endTime - startTime);

    cout << " final sum is " << evenSum << " " << oddSum << endl;

    cout << " time taken without thread " << duration.count() / (long double)1e6 << endl;

    return 0;

}

OUTPUT :

 final sum is 1000000001000000000 1000000000000000000
 time taken with thread :5.01665
 final sum is 1000000001000000000 1000000000000000000
 time taken without thread 2.83442

The output is quite unexpected. Threading takes 5 seconds and non-threading takes 2.8 seconds. HOW !!!

Solutions I have tried and failed,

  1. g++ --std=c++11 file.cpp
  2. g++ --std=c++11 -O3 -s -DNDEBUG file.cpp ( time changed frantically, but still threading takes more time )
  3. g++ --std=c++17 file.cpp
  4. g++ clang++ -std=c++11 file.cpp ( Got error, that "CLANG" not present )
  5. Tried changing XCode->Product->Scheme->Change Scheme ( Doesn't open any menu for changing scheme, Dead end )
  6. Above commands I have tried in VS code, as well as MAC terminal (iterm). I tried changing VS code build from "DEBUG" to "RELEASE", that also doesn't help.

PS : The same code used to work when I had practiced last time in the Linux (2 years back)


Solution

  • You should enable optimizations like -O3. on most recent gcc and clang with -O3 both functions are completely optimized away, you can prevent that optimization by declaring its argument as volatile.

    void getEvenSum(volatile unsigned long long& sum) {
        for (unsigned long long i = 0; i <= maxLimit; i += 2) {
            sum += i;
        }
    }
    

    this also convert the variables to unsigned long long to avoid signed overflow which is undefined behavior.

    The second problem is now false sharing, one way around it is to pad both variables away from each other, or align those variables to cache line boundaries (hint: use std::hardware_destructive_interference_size, since C++17)

        alignas(64) unsigned long long evenSum = 0;
        alignas(64) unsigned long long oddSum = 0;
    

    with that the multithreaded version is now faster than the single-threaded version. online godbolt result

    By disabling those optimizations We went from the function taking 0 time to some time ... so did we really make the code faster ?!