c++openmpeigeneigen3

# Reduction with OpenMP: Matrix/tensor operations examples

I am trying to learn how to use reduction with OpenMP and was wondering if the following examples can be rewritten with reduction?

Example A: Finding the sum of three matrices/tensors

``````#include <Eigen/Dense>
#include <Eigen/SparseCore>
#include <Eigen/Sparse>
static const int nx = 128;
static const int ny = 128;
static const int nz = 128;

Eigen::Tensor<std::complex<double>, 3> Tensor1(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Tensor2(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Tensor3(nx,ny,nz);
for(int i = 0; i< nx; i++){
for(int j = 0; j< ny; j++){
for(int k = 0; k< nz; k++){
Sum(k,i,j) = (Tensor1(k,i,j) + Tensor2(k,i,j) + Tensor3(k,i,j));

}
}
}
``````

Example B: Finding separate terms of the cross product of (AxB):

``````#include <Eigen/Dense>
#include <Eigen/SparseCore>
#include <Eigen/Sparse>
static const int nx = 128;
static const int ny = 128;
static const int nz = 128;

double B[3] = {Bx,By,Bz};

Eigen::Tensor<std::complex<double>, 3> Ax(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Ay(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Az(nx,ny,nz);

Eigen::Tensor<std::complex<double>, 3> Termx(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Termy(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Termz(nx,ny,nz);

for(int i = 0; i< nx; i++){
for(int j = 0; j< ny; j++){
for(int k = 0; k< nz; k++){
Termx(k,i,j) = (Az(k,i,j) * By - Ay(k,i,j) * Bz);
Termy(k,i,j) =  (Ax(k,i,j) * Bz - Az(k,i,j) * Bx);
Termz(k,i,j) =  (Ay(k,i,j) * Bx - Ax(k,i,j) * By);

}
}
}

``````

I have not seen examples of reduction with either nested loops or matrices/tensors so I am not sure if the above examples work with reduction or not. Thanks!

Solution

• The short answer is no.

The longer answer is that your examples don't fit with the basic idea of what reduction is intended to accomplish.

Reduction is typically used for a case where all the threads write to a single output. For a really simplistic example, consider normalizing a large matrix, by simply adding up all the numbers, then dividing each by the sum so they're all in the range 0..1.

``````#pragma omp parallel for
for (int i=0; i<n; i++)
for (int j=0; j<m; j++)
for (int k=0; k<o; k++)
sum += matrix[i][j][k];
``````

This has a problem: all the threads write to `sum`. To keep meaningful results, the OpenMP runtime serializes access to `sum`, and we gain very little (if anything) from all the threads running in parallel. Depending on the implementation, we might easily be locking and unlocking a mutex every iteration, in which case parallel code might even be quite a lot slower than serial code.

We can use `reduction` to fix that:

``````#pragma omp parallel for reduction(+:sum)
for (int i=0; i<n; i++)
for (int j=0; j<m; j++)
for (int k=0; k<o; k++)
sum += matrix[i][j][k];
``````

With this, each thread gets a private variable where it stores its local sum. Then when the threads finish, it does the reduction (using the operator we specified--`+` in this case) and adds all those separate values together.

`reduction` helps when all the threads work together to produce a single value, and we can avoid pressure on that one value by having each thread produce a value independently, and afterwards combine all those independent values.

That doesn't really work in your case, because each of your writes is already to a separate variable from that used by any other thread.