Search code examples
c++parallel-processingopenmp

Why doesn't this OpenMP paralelize?


I've got this code and method here:

#include <omp.h>

const int TS_GLOBAL = 20;

void apply_grayscale(uint8_t* input_buffer,uint8_t* output_buffer,uint16_t image_w,uint16_t image_h)
{
size_t pixel_row_size = 3 * image_w;
if(pixel_row_size % 4)
    pixel_row_size += 4 - (pixel_row_size % 4);

    
const int TS = TS_GLOBAL;
#pragma omp taskloop grainsize(TS) // <------- here
for(uint16_t i = 0; i < image_h; i++)
    {
        uint8_t* current_input_buffer = input_buffer + (i * pixel_row_size);
        uint8_t* current_output_buffer = output_buffer + (i * pixel_row_size);
    
        grayscale_row(current_input_buffer,current_output_buffer,image_w);
    }
}

It doesn't seem to scale too well. When doing it with

#pragma omp parallel for

it has a significant increase in speed. Nevertheless, I need it to work with an OpenMP's tasks implementation too. Here is a code snippet of a working #prama omp parallel that works just fine.

void apply_grayscale(uint8_t* input_buffer,uint8_t* output_buffer,uint16_t image_w,uint16_t image_h)
{
    size_t pixel_row_size = 3 * image_w;
    if(pixel_row_size % 4)
        pixel_row_size += 4 - (pixel_row_size % 4);
    
        
    #pragma omp parallel for
    for(uint16_t i = 0; i < image_h; i++)
    {
        uint8_t* current_input_buffer = input_buffer + (i * pixel_row_size);
        uint8_t* current_output_buffer = output_buffer + (i * pixel_row_size);
        
        grayscale_row(current_input_buffer,current_output_buffer,image_w);
    }
}

However, I want to use the task paradigm, to gain a better understanding of the technology.

I found this task parallelization from openmp's official tutorial site. It should do virtually the same task-oriented for.

Any idea why I don't get any gains?


Solution

  • Courtesy to @Homer512 it's now working. If anyone ever encounters this issue again, here is how I solved it!

    const int per_task = 2;
    
    void apply_grayscale(uint8_t* input_buffer,uint8_t* output_buffer,uint16_t image_w,uint16_t image_h){
      size_t pixel_row_size = 3 * image_w;
      if(pixel_row_size % 4)
          pixel_row_size += 4 - (pixel_row_size % 4);
    
      
      #pragma omp parallel
      #pragma omp single
      {
          for(uint16_t i = 0; i < image_h; i+=per_task) {
              #pragma omp task
              {
                  for (int j = 0; j < per_task && i + j < image_h; j++) {
                      uint8_t* current_input_buffer = input_buffer + ((i + j) * pixel_row_size);
                      uint8_t* current_output_buffer = output_buffer + ((i + j) * pixel_row_size);            
                      grayscale_row(current_input_buffer,current_output_buffer,image_w);
                  }
              }
          }   
      }
    }
    }