Search code examples
c++hpx

Parallel reduce (e.g. sum) a vector of hpx::futures<double>


We are currently trying to implement a red-black Gauss-Seidel solver for a numerical simulation.

For that we divided our simulation area into equal sized subgrids. We are able to asynchronously perform the red-black cycles for the pressure equation on each subgrid with the correct dependencies and a hpx::dataflow object.

But now we have the following problem: after every n-th cycle we have to perform a residual calculation to determine if we are converged already.

So the optimal solution would be, that we start every local residual calculation separately/asynchronously and then sum over a vector of hpx::future<double>. With the idea of HPX futures that could lead to an optimal solution, i.e. we sum up all elements as soon as possible.

But so far we were only able to come up with the following code:

#include <hpx/hpx_main.hpp>
#include <hpx/hpx.hpp>
#include <memory>
#include <iostream>

class A {
public:
  double residual() {
    // Calculate actual local residual
    return 1.0;
  }
};

int main() {
  // Create instances
  std::vector<A> vec(3);
  std::vector<hpx::shared_future<double>> res(vec.size());

  // asynchronous launch resdiual calculation
  for (size_t i = 0; i < res.size(); ++i) {
    res[i] = hpx::async( &A::residual, &vec[i] );
  }

  double residual = 0.0;
  for (size_t i = 0; i < res.size(); ++i) {
    residual += res[i].get();
  }

  std::cout << "residual: " << residual << std::endl;

  return 0;
}

That's far from optimal. In worst case it performance like a global barrier followed by a pure sequential sum over all elements.

So our question is how we could implement this "HPX" like in parallel?

Update 02.02.2019:

We already rewrote our code so that we can't start our residual calculation fully asynchronously, but based on data dependencies via a hpx::dataflow object.

  // asynchronous launch resdiual calculation
  for (size_t i = 0; i < res.size(); ++i) {
    res[i] = hpx::dataflow( hpx::util::unwrapping(&A::residual), &vec[i], *DEPENDENCIES* );
  }

Is it possible to call @Mike van Dyke code with dataflow objects too or is there another solution?

(Hint: I didn't got your code working because of a template argument deduction/substitution failed error)


Solution

  • You can use the transform_reduce pattern for what you want to achieve:

    std::vector<A> vec(300);
    double res = hpx::parallel::transform_reduce(hpx::parallel::execution::par,        
                         vec.begin(), vec.end(),                      \\ (1)                         
                         0, [](double a, double b){ return a + b; },  \\ (2)
                         [](const A& a_ref){ return a_ref.residual(); });   \\ (3)
    

    This code will calculate the residual (3) of each A in vec (1), then will sum up all results (2).