I'm trying to make a program that multiplies two arrays in parallel so that each thread multiplies a row by a column. The problem is that if I put the omp for
in the outside for
, the thread will execute the entire internal for
instead of just executing the task, and if I put the omp for
in the inside for
, the for
from outside will run multiple times on multiple threads because it is in the scope of 'omp parallel'. I want to run only the task in the thread and I do not want the outside for run multiple times.
for (int line = 0; line < n; ++line) {
for (int column = 0; column < n; ++column) {
// only that need to run in new thread
multiply_line_per_column(line, column);
}
}
One of the options is to use collapse
clause: https://stackoverflow.com/a/13357158/2485717
You may also rewrite your for loop to avoid being nested:
for (int i = 0; i < n * n; ++i) {
int line = i % n;
int column = i / n;
multiply_line_per_column(line, column);
}
As pointed out by @Hristo Iliev in the comment, there will be considerable additional cost from integer division and modulo operators.
The drawback is more obvious when n
is not a power of 2
.