I want to write a C++17 parallel execution algorithm, but I'm having some trouble. Let's start with the code:
#if __has_include(<execution>)
#include <execution>
#include <thread>
#include <future>
#endif
template<class RandomAccessIterator>
inline auto mean(RandomAccessIterator first, RandomAccessIterator last)
{
auto it = first;
auto mu = *first;
decltype(mu) i = 2;
while(++it != last)
{
mu += (*it - mu)/i;
i += 1;
}
return mu;
}
#if __has_include(<execution>)
template<class ExecutionPolicy, class RandomAccessIterator>
inline auto mean(ExecutionPolicy&& exec_pol, RandomAccessIterator first, RandomAccessIterator last) {
using Real = typename std::iterator_traits<RandomAccessIterator>::value_type;
//static_assert(std::is_execution_policy_v<ExecutionPolicy>, "First argument must be an execution policy.");
if (exec_pol == std::execution::par) {
size_t elems = std::distance(first, last);
if (elems*sizeof(Real) < /*guestimate*/ 4096) {
return mean(first, last);
}
unsigned threads = std::thread::hardware_concurrency();
if (threads == 0) {
threads = 2;
}
std::vector<std::future<Real>> futures;
size_t elems_per_thread = elems/threads;
auto it = first;
for (unsigned i = 0; i < threads -1; ++i) {
futures.push_back(std::async(std::launch::async, &mean<RandomAccessIterator>, it, it + elems_per_thread));
it += elems_per_thread;
}
futures.push_back(std::async(std::launch::async, &mean<RandomAccessIterator>, it, last));
Real mu = 0;
for (auto fut : futures) {
mu += fut.get();
}
mu /= threads;
return mu;
}
else { // should have else-if for various types of execution policies, but let's save that for later.
return mean(first, last);
}
}
#endif
Ok, so questions:
ExecutionPolicy
argument by const &
. The static_assert
passed, but then I got hung a compile error on the if (exec_pol == std::execution::par)
, namely: error: no match for ‘operator==’ (operand types are ‘const __pstl::execution::v1::parallel_policy’ and ‘const __pstl::execution::v1::parallel_policy’)
117 | if (exec_pol == std::execution::par) {
| ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
Then I looked at /usr/include/c++/9/pstl/algorithm_impl.h
, and in it, they are passing the ExecutionPolicy
around by move and forwarding it various places, so I guess I should to. But that didn't fix anything, so I looked at /usr/include/c++/9/pstl/parallel_backend_tbb.h
. And in that file, they don't even check what the parallel execution policy is! For example, as few lines from the aforementioned file:
//! Evaluation of brick f[i,j) for each subrange [i,j) of [first,last)
// wrapper over tbb::parallel_for
template <class _ExecutionPolicy, class _Index, class _Fp>
void
__parallel_for(_ExecutionPolicy&&, _Index __first, _Index __last, _Fp __f)
{
tbb::this_task_arena::isolate([=]() {
tbb::parallel_for(tbb::blocked_range<_Index>(__first, __last), __parallel_for_body<_Index, _Fp>(__f));
});
}
So have I fundamentally misunderstood how to write a parallel algorithm using C++17 parallel execution policies? If not, how do I check the execution policy and use it correctly?
Take ExecutionPolicy&& exec_pol
by value: ExecutionPolicy exec_pol
. It is a tag. Taking by forwarding reference just confuses things.
Either test for type, or tag dispatch:
if constexpr(std::is_same_v<ExecutionPolicy,
std::execution::parallel_policy>)
as @Davis answer implies.
If you don't want to take by value (and you should take by value), you can use either std::decay_t
or std::remove_ref_t< std::remove_cv_t< ExecutionPolicy > >
to strip off the cv/ref that perfect forwarding stores in the type.
But again, don't do that.