Does HPX provide any sort of parallelized iteration function built on top of task-based fork-join parallelism that also lets you control the grain size used? similar to TBB's parallel_for or Cilk's cilk_for.
It does. We implemented some extensions to what the standardization committee is contemplating about. HPX has introduced the concept of ExecutorParameters
which amongst other things allow to control the grain-size of the parallelization of iterations. For instance:
std::vector<int> v = { ... };
hpx::parallel::static_chunk_size scs;
hpx::parallel::for_each(
hpx::parallel::execution::par.with(scs),
v.begin(), v.end(),
[](int val) { ... }
);
This will split the iterations into tasks of (num_iterations / 4 * cores)
loop iterations. You can also specify the size of the tasks:
hpx::parallel::static_chunk_size scs(100);
which will combine 100 iterations in each task.
Other existing executor parameters are for instance dynamic_chunk_size
(similar to openmp's schedule(dynamic)
), and guided_chunk_size
(similar to openmp's schedule(guided)
), etc.