c multithreading performance parallel-processing openmp

C OMP for loop in parallel region. Not work-shared

I have a function that I want to parallelize. This is the serial version.

void parallelCSC_SpMV(float *x, float *b)
{
    int i, j;
    for(i = 0; i < numcols; i++)
    {
        for(j = colptrs[i] - 1; j < colptrs[i+1] - 1; j++)
        {
            b[irem[j] - 1] += xrem[j]*x[i];
        }
    }
}

I figured a decent way to do this was to have each thread write to a private copy of the b array (which does not need to be a protected critical section because its a private copy), after the thread is done, it will then copy its results to the actual b array. Here is my code.

void parallelCSC_SpMV(float *x, float *b)
{
    int i, j, k;
    #pragma omp parallel private(i, j, k)
    {
        float* b_local = (float*)malloc(sizeof(b));       
     
        #pragma omp for nowait
        for(i = 0; i < numcols; i++)
        {
            for(j = colptrs[i] - 1; j < colptrs[i+1] - 1; j++)
            {
                float current_add = xrem[j]*x[i];
                int index = irem[j] - 1;
                b_local[index] += current_add;
            }
        }
        
        for (k = 0; k < sizeof(b) / sizeof(b[0]); k++)
        {
            // Separate question: Is this if statement allowed?
            //if (b_local[k] == 0) { continue; }
            #pragma omp atomic
            b[k] += b_local[k];
        }
    }
}

However, I get a segmentation fault as a result of the second for loop. I do not need to a "#pragma omp for" on that loop because I want each thread to execute it fully. If I comment out the content inside the for loop, no segmentation fault. I am not sure what the issue would be.

Solution

You're probabily trying to access an out of range position in the dynamic array b_local.

See that sizeof(b) will return the size in bytes of float* (size of a float pointer).

If you want to know the size of the array that you are passing to the function, i would suggest you add it to the parameters of the function.

void parallelCSC_SpMV(float *x, float *b, int b_size){
...
    float* b_local = (float*) malloc(sizeof(float)*b_size); 
...
}

And, if the size of colptrs is numcols i would be careful with colptrs[i+1], since when i=numcols-1 will have another out of range problem.