Search code examples
c++g++c++17simdtbb

calling for_each with execution policy par_unseq and yet links without tbb


In a source file called holes5.cpp, I have this code:

cdtt is a Lamda with side effects.

for (int depth=0; depth<10; depth++)
{
  auto rng = views::iota(0, (int)decision_tree.size()) |
             views::filter([&](int id){return decision_tree[id].depth==depth;});
  for_each(execution::par_unseq, rng.begin(), rng.end(), cdtt);
}

In CMakelists.txt, I have:

list(APPEND CMAKE_MODULE_PATH "deps/tbb/cmake/")
find_package(TBB REQUIRED)
set (SOURCES holes5.cpp)
add_executable(holes5 ${SOURCES})
target_link_libraries(holes5 PUBLIC TBB::tbb)

Now if I comment out

target_link_libraries(holes5 PUBLIC TBB::tbb)

It still links without error.

On top of that, my machine has 4 cores, and there is zero performance gain from using par_unseq over seq. The result still gets computed normally. It really seems like this call to for_each() does not really get parallelized.

I compiled this with G++12.


Solution

  • Use of input iterators made from C++20 views prevent calls to tbb.

    for (int depth=0; depth<10; depth++)
    {
      auto rng = views::iota(0, (int)decision_tree.size()) |
                 views::filter([&](int id){return decision_tree[id].depth==depth;});
      vector<int> input(rng.begin(), rng.end());
      for_each(execution::par_unseq, input.begin(), input.end(), cdtt);
    }
    

    Will be properly vectorized and make calls to tbb