In a source file called holes5.cpp, I have this code:
cdtt is a Lamda with side effects.
for (int depth=0; depth<10; depth++)
{
auto rng = views::iota(0, (int)decision_tree.size()) |
views::filter([&](int id){return decision_tree[id].depth==depth;});
for_each(execution::par_unseq, rng.begin(), rng.end(), cdtt);
}
In CMakelists.txt, I have:
list(APPEND CMAKE_MODULE_PATH "deps/tbb/cmake/")
find_package(TBB REQUIRED)
set (SOURCES holes5.cpp)
add_executable(holes5 ${SOURCES})
target_link_libraries(holes5 PUBLIC TBB::tbb)
Now if I comment out
target_link_libraries(holes5 PUBLIC TBB::tbb)
It still links without error.
On top of that, my machine has 4 cores, and there is zero performance gain from using par_unseq over seq. The result still gets computed normally. It really seems like this call to for_each()
does not really get parallelized.
I compiled this with G++12.
Use of input iterators made from C++20 views prevent calls to tbb.
for (int depth=0; depth<10; depth++)
{
auto rng = views::iota(0, (int)decision_tree.size()) |
views::filter([&](int id){return decision_tree[id].depth==depth;});
vector<int> input(rng.begin(), rng.end());
for_each(execution::par_unseq, input.begin(), input.end(), cdtt);
}
Will be properly vectorized and make calls to tbb