How to summing element in array using taskloop on OMP

I was trying to using taskloop to finding sum of element in array.

#include <iostream>
#include <omp.h>
using namespace std;
#define NUMTHREAD 6
#define SIZE 100
int main() {
    int* n = new int [SIZE];
    for (int i = 0; i < SIZE; ++i) {
        n[i] = 1;
    }
    long sum =0;

#pragma omp parallel num_threads(NUMTHREAD)
    {
        #pragma omp taskloop
        for (int i = 0; i < SIZE; i++) {
                 sum += n[i];
        }
    }
    cout << sum<< "\n";
        return 0;
}

but output is

It's should be 100 I don't know why sum is shared.

I was trying to made sum not shared and for each thread.

#pragma omp parallel num_threads(NUMTHREAD)
    {
        #pragma omp taskloop private(sum)
        for (int i = 0; i < SIZE; i++) {
                 sum += n[i];
        }
    }

but Output is

IDK why?

Solution

The taskloop construct is a task-generating construct, not a worksharing-loop construct. You can read the following in OpenMP specification:

The taskloop construct is a task generating construct. When a thread encounters a taskloop construct, the construct partitions the iterations of the associated loops into explicit tasks for parallel execution.

In your case 6 threads encounter the taskloop construct, therefore the loop is executed 6 times, and you got a bigger result than expected. To correct it you have to i) make sure that only a single thread executes the tasks generating construct, or ii) use a worksharing-loop construct.

There is a race condition at line sum += n[i];, as different threads update sum simultaneously. To correct it the best option is to use reduction (reduction(+:sum))
In the second case the sharing attribute of sum is private, so the variable sum will be privatized. It means that a local variable is created, and only the value of the local variable is changed. It does not change the value of sum in the enclosing context, therefore you got 0.

To sum up, here are the 2 alternatives mentioned above to correct your code:

i) using single construct and taskloop:

#pragma omp parallel num_threads(NUMTHREAD)
#pragma omp single nowait
{
    #pragma omp taskloop reduction(+:sum)
    for (int i = 0; i < SIZE; i++) {
                sum += n[i];
    }
}

ii) using a worksharing-loop construct:

#pragma omp parallel num_threads(NUMTHREAD)
{
    #pragma omp for reduction(+:sum)
    for (int i = 0; i < SIZE; i++) {
                sum += n[i];
    }
}

Some minor, off-topic comments:

The array (allocated by new operator) is never freed, you should use delete to deallocate it. As pointed out by @VictorEijkhout it is even better if you do not use new at all. Use vector or array from the std library.
You should prefer const/constexpr int SIZE=100; over #define SIZE 100. More details can be found e.g. here.
You should consider reading Why is "using namespace std;" considered bad practice?