I have a code
( the Floyd-Warshall algorithm for the shortest path in an NxN
matrix ),
with three for
-loops, one within the other and with the same number of cycles.
In the last for
I have an assignment via a ternary-operation = <bool> ? <val1> : <val2>
- based on a comparison and if it is True
or not.
I used an OpenMP to parallelize the second for
with a #pragma omp
parallel for
.
I can't compute the parallel percentage and serial percentage of the code to successfully apply the Amdahl Law to recover the theoretical speedup.
for (k = 0; k < N; k++)
#pragma omp parallel for private(i,j) shared(x) num_threads(4)
for (i = 0; i < N; i++){
for (j = 0; j < N; j++) {
x[i][j] = x[i][j] < (x[i][k] + x[k][j]) ? x[i][j] : (x[i][k] + x[k][j]) ;
}
}
I'm using four cores, so I expect a theoretical speedup of 4.
X[i][j]
is the matrix, where every element acts as the weight of the edge which connects nodes i
and j
; it is the macro INF
( infinite ) if they're not connected.
The two inner loops and the body of the innermost loop are executed in serial on each core.That's because you marked the outer loop to be executed in parallel.
But:
The body in the innermost loop uses the same matrix for all cores and also modifies the matrix. therefore the changes in the matrix must be propageted to all other cores. This might in the lead to the following problems:
You should check if it is possible to change your algorithm in a way that not intersecting partial matrixes can be processed. You will get the best speedup if the cores work on separate not intersecting data.
Since there is nearly no effort in doing so you should definitively profile the code and variants of it.