Search code examples
c++parallel-processingg++stl-algorithm

Why does GNU Parallel extensions seems to make algorithms go slower?


I wanted to follow this guide : http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt12ch31s03.html

Here is some example code :

#include <numeric>
#include <vector>
#include <iostream>
#include <chrono>

using namespace std;
int main()
{
    vector<int> in(1000);
    vector<double> out(1000);
    iota(in.begin(), in.end(), 1);

    auto t = std::chrono::high_resolution_clock::now();
    for(int i = 0; i < 100000; ++i)
        accumulate(in.begin(), in.end(), 0);

    auto t2 = std::chrono::high_resolution_clock::now();

    cout << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t).count()  << endl;

    return 0;
}

I have the following results :

~:$ g++ test.cpp -std=c++11
~:$ ./a.out 
900
~:$ g++ test.cpp -D_GLIBCXX_PARALLEL -std=c++11 -fopenmp -march=native 
~:$ ./a.out 
1026

When doing multiple runs, these two stays at about the same time. I have also tried with other algorithms, like sort, generate, find, transform... I have an i7, with hyperthreading enabled (4 logical cores). I run g++-4.8.1

Thanks


Solution

  • I think you need to try something a little more heavy. All your doing is adding ints together. The overhead of creating the threads, etc. will be greater. Try replacing int with std::string and running the following code and compare the output:

    int main()
    {
        vector<string> in(100000);
    
        auto t = std::chrono::high_resolution_clock::now();
        accumulate(in.begin(), in.end(), string(), [](string s1, string s2){ return s1 += s2 + "a" + "b";});
        auto t2 = std::chrono::high_resolution_clock::now();
    
        cout << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t).count()  << endl;
    }