Parallel computing in C++ with OpenMP brings no difference to elapsed time

I am studying OpenMP. I am tried implementing some examples to test my understanding. Below is some simple code where I try to compute a simple sum (adding just zero each time) and compare the efficiency between parallelized code with openMP and the other code. Here it is:

#include <iostream>
#include <omp.h>
#include <unistd.h>
#include <chrono>

int main(){
    int N = 100000;
    int sum;
    std::chrono::time_point<std::chrono::system_clock> start, end;
    start = std::chrono::system_clock::now();


    #pragma omp parallel for reduction(+:sum)
    {
        for(int i = 0; i < N; i++){
            sum += 0;
        }
    }

    end = std::chrono::system_clock::now();
    std::cout << "Parallel Elapsed time:" << std::endl;
    std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end-start).count()<<std::endl;


    start = std::chrono::system_clock::now();
    for(int i = 0; i < N; i++){
        sum += 0;
    }


    end = std::chrono::system_clock::now();
    std::cout << "Sequential Elapsed time:" << std::endl;
    std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end-start).count() <<std::endl;


    return 0;
}

Which yields:

Parallel Elapsed time:
351000
Sequential Elapsed time:
367000

Or around those figures after several executions. My question is where is the catch? It seems like everything is alright with my code.

Solution

OpenMP threading takes time so if your code does almost nothing in the part that has been parallized (here you just do a sum) the gain of making things in parallel is nullified by the time it takes create/select the openMP threads.

Furthermore in your code you use a reduction so all of you thread are most probably blocked waiting to access the shared sum variable. I am even surprise you get a performance gain using openMp in this specific case.

If the complexity of what you do in parallel increases, you will see a real gain in using OpenMP. (or any other parallel processing approach).

You could for instance try (as an "exercice") to sort independant vectors (without reduction) in parallel and sequentially to start seeing the benefits of openMP.