c++openmpeigeneigen3# Reduction with OpenMP: Matrix/tensor operations examples

I am trying to learn how to use reduction with OpenMP and was wondering if the following examples can be rewritten with reduction?

Example A: Finding the sum of three matrices/tensors

```
#include <Eigen/Dense>
#include <Eigen/SparseCore>
#include <Eigen/Sparse>
static const int nx = 128;
static const int ny = 128;
static const int nz = 128;
Eigen::Tensor<std::complex<double>, 3> Tensor1(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Tensor2(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Tensor3(nx,ny,nz);
for(int i = 0; i< nx; i++){
for(int j = 0; j< ny; j++){
for(int k = 0; k< nz; k++){
Sum(k,i,j) = (Tensor1(k,i,j) + Tensor2(k,i,j) + Tensor3(k,i,j));
}
}
}
```

Example B: Finding separate terms of the cross product of (AxB):

```
#include <Eigen/Dense>
#include <Eigen/SparseCore>
#include <Eigen/Sparse>
static const int nx = 128;
static const int ny = 128;
static const int nz = 128;
double B[3] = {Bx,By,Bz};
Eigen::Tensor<std::complex<double>, 3> Ax(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Ay(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Az(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Termx(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Termy(nx,ny,nz);
Eigen::Tensor<std::complex<double>, 3> Termz(nx,ny,nz);
for(int i = 0; i< nx; i++){
for(int j = 0; j< ny; j++){
for(int k = 0; k< nz; k++){
Termx(k,i,j) = (Az(k,i,j) * By - Ay(k,i,j) * Bz);
Termy(k,i,j) = (Ax(k,i,j) * Bz - Az(k,i,j) * Bx);
Termz(k,i,j) = (Ay(k,i,j) * Bx - Ax(k,i,j) * By);
}
}
}
```

I have not seen examples of reduction with either nested loops or matrices/tensors so I am not sure if the above examples work with reduction or not. Thanks!

Solution

The short answer is no.

The longer answer is that your examples don't fit with the basic idea of what reduction is intended to accomplish.

Reduction is typically used for a case where all the threads write to a single output. For a really simplistic example, consider normalizing a large matrix, by simply adding up all the numbers, then dividing each by the sum so they're all in the range 0..1.

```
#pragma omp parallel for
for (int i=0; i<n; i++)
for (int j=0; j<m; j++)
for (int k=0; k<o; k++)
sum += matrix[i][j][k];
```

This has a problem: all the threads write to `sum`

. To keep meaningful results, the OpenMP runtime serializes access to `sum`

, and we gain very little (if anything) from all the threads running in parallel. Depending on the implementation, we might easily be locking and unlocking a mutex every iteration, in which case parallel code might even be quite a lot slower than serial code.

We can use `reduction`

to fix that:

```
#pragma omp parallel for reduction(+:sum)
for (int i=0; i<n; i++)
for (int j=0; j<m; j++)
for (int k=0; k<o; k++)
sum += matrix[i][j][k];
```

With this, each thread gets a private variable where it stores its local sum. Then when the threads finish, it does the reduction (using the operator we specified--`+`

in this case) and adds all those separate values together.

`reduction`

helps when all the threads work together to produce a single value, and we can avoid pressure on that one value by having each thread produce a value independently, and afterwards combine all those independent values.

That doesn't really work in your case, because each of your writes is already to a separate variable from that used by any other thread.

- How to get column of a multidimensional array in C/C++?
- In CMake, how can I test if the compiler is Clang?
- Find four,whose sum equals to target
- Can't make MSMPI to work. It always says "mpi.h: No such file or directory gcc"
- Do all C compilers implicitly drop the fractional when converting floating point to integer?
- Message Queue (mqueue.h), Invalid Argument Error, In C
- Undefined Macro in #if directive?
- C/C++: How to use the do-while(0); construct without compiler warnings like C4127?
- Minimum number of phones
- Nesting Structs in C
- Visual Studio Code for some reason changes '\' with '/' in Include Path
- Generating all divisors of a number given its prime factorization
- sizeof single struct member in C
- Memory usage isn't decreasing when using free?
- Why are all the strings the same
- strstr issue in C
- In C macros, should one prefer do { ... } while(0,0) over do { ... } while(0)?
- Union initialization in C++ and C
- Trying to store strings returned by strsep gives seg fault
- C pointer address printing
- strcmp or string::compare?
- What is proper name for the WINAPI tag in a c function?
- Need help in understanding the Capsicum capability mode and its effect on getpass()
- Force a program created using `exec` to perform unbuffered I/O
- How to display characters in a write() funtion C
- Declaring extern structures in header file in C
- match files .c on tasks.json to compile
- Replacing spaces with %20 in C
- What does the comma operator , do?
- How to build multiple targets from one makefile