Does OpenMP natively support reduction of a variable that represents an array?
This would work something like the following...
float* a = (float*) calloc(4*sizeof(float));
omp_set_num_threads(13);
#pragma omp parallel reduction(+:a)
for(i=0;i<4;i++){
a[i] += 1; // Thread-local copy of a incremented by something interesting
}
// a now contains [13 13 13 13]
Ideally, there would be something similar for an omp parallel for, and if you have a large enough number of threads for it to make sense, the accumulation would happen via binary tree.
Only in Fortran in OpenMP 3.0, and probably only with certain compilers.
See the last example (Example 3) on:
http://wikis.sun.com/display/openmp/Fortran+Allocatable+Arrays