I've got this code and method here:
#include <omp.h>
const int TS_GLOBAL = 20;
void apply_grayscale(uint8_t* input_buffer,uint8_t* output_buffer,uint16_t image_w,uint16_t image_h)
{
size_t pixel_row_size = 3 * image_w;
if(pixel_row_size % 4)
pixel_row_size += 4 - (pixel_row_size % 4);
const int TS = TS_GLOBAL;
#pragma omp taskloop grainsize(TS) // <------- here
for(uint16_t i = 0; i < image_h; i++)
{
uint8_t* current_input_buffer = input_buffer + (i * pixel_row_size);
uint8_t* current_output_buffer = output_buffer + (i * pixel_row_size);
grayscale_row(current_input_buffer,current_output_buffer,image_w);
}
}
It doesn't seem to scale too well. When doing it with
#pragma omp parallel for
it has a significant increase in speed. Nevertheless, I need it to work with an OpenMP's tasks implementation too. Here is a code snippet of a working #prama omp parallel
that works just fine.
void apply_grayscale(uint8_t* input_buffer,uint8_t* output_buffer,uint16_t image_w,uint16_t image_h)
{
size_t pixel_row_size = 3 * image_w;
if(pixel_row_size % 4)
pixel_row_size += 4 - (pixel_row_size % 4);
#pragma omp parallel for
for(uint16_t i = 0; i < image_h; i++)
{
uint8_t* current_input_buffer = input_buffer + (i * pixel_row_size);
uint8_t* current_output_buffer = output_buffer + (i * pixel_row_size);
grayscale_row(current_input_buffer,current_output_buffer,image_w);
}
}
However, I want to use the task paradigm, to gain a better understanding of the technology.
I found this task parallelization from openmp's official tutorial site. It should do virtually the same task-oriented for.
Any idea why I don't get any gains?
Courtesy to @Homer512 it's now working. If anyone ever encounters this issue again, here is how I solved it!
const int per_task = 2;
void apply_grayscale(uint8_t* input_buffer,uint8_t* output_buffer,uint16_t image_w,uint16_t image_h){
size_t pixel_row_size = 3 * image_w;
if(pixel_row_size % 4)
pixel_row_size += 4 - (pixel_row_size % 4);
#pragma omp parallel
#pragma omp single
{
for(uint16_t i = 0; i < image_h; i+=per_task) {
#pragma omp task
{
for (int j = 0; j < per_task && i + j < image_h; j++) {
uint8_t* current_input_buffer = input_buffer + ((i + j) * pixel_row_size);
uint8_t* current_output_buffer = output_buffer + ((i + j) * pixel_row_size);
grayscale_row(current_input_buffer,current_output_buffer,image_w);
}
}
}
}
}
}