Search code examples
c++multithreadingstructboosttuples

applying std algorithms over structs?


We have Boost.PFR and we have the tuple iterator. If we combine the two, we might have a way of applying std algorithms over structs. Does a solution already exist? What I'm looking for is:

S a, b;
auto const ra(to_range(a)), rb(to_range(b));
std::transform(ra.begin(), ra.end(), rb.begin(), [](auto&& a)noexcept{return a;});

This would allow us to use the newer <execution> features to process structs out of sequence or in parallel.


Solution

  • So to illustrate the points I tried to make in the comments to the other answer, let's write something like your transform:

    A Direct Implementation

    I skipped the notion of iterators and standard library, since it is encumbered with the whole "iterator value type must be fixed" and other burden.

    Instead, let's do it "functionally".

    #include <boost/pfr.hpp>
    namespace pfr = boost::pfr;
    
    template <typename Op, typename... T>
    void transform(Op f, T&&... operands) {
        auto apply = [&]<int N>() {
            f(pfr::get<N>(std::forward<T>(operands))...);
            return 1;
        };
    
        constexpr auto size = std::min({pfr::tuple_size<std::decay_t<T>>::value...});
        // optionally assert that sizes match:
        //static_assert(size == std::max({pfr::tuple_size<std::decay_t<T>>::value...}));
    
        [=]<auto... N>(std::index_sequence<N...>) {
            return (apply.template operator()<N>() + ...);
        }
        (std::make_index_sequence<size>{});
    }
    

    I already generalized a bit by not making the arity fixed. It's more like an n-ary zip or visitor now. To get the transform you wanted you'd pass it an operation like

    auto binary = [](auto const& a, auto& b) {
        b = a;
    };
    

    Let's demo this, highlighting mixed type members, asymmetric types as well as mixed-length structs:

    struct S1 { int a; double b; long c; float d; };
    struct S2 { double a; double b; double c; double d; };
    struct S3 { double a; double b; };
    

    Test cases:

    int main() {
    
        auto n_ary = [](auto&... fields) {
            puts(__PRETTY_FUNCTION__);
            return (... = fields);
        };
    
        S1 a;
        S2 b;
        S3 c;
    
        // all directions
        transform(binary, a, b);
        transform(binary, b, a);
    
        // mixed sizes
        transform(binary, b, c);
        transform(binary, c, a);
    
        // why settle for binary?
        transform(n_ary, a, b);
        transform(n_ary, a, b, c);
        transform(n_ary, c, b, a);
    }
    

    See it Live On Compiler Explorer

    Already, the disassembly suggests everything is getting inlined and optimized away. Literally. Only the puts calls remain:

    main:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        mov     edi, OFFSET FLAT:.LC1
        call    puts
        mov     edi, OFFSET FLAT:.LC2
        call    puts
        ...
        ...
        xor     eax, eax
        add     rsp, 8
        ret
    

    giving the output

    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = int; auto:13 = double]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = double; auto:13 = double]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = long int; auto:13 = double]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = float; auto:13 = double]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = double; auto:13 = int]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = double; auto:13 = double]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = double; auto:13 = long int]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = double; auto:13 = float]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = double; auto:13 = double]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = double; auto:13 = double]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = double; auto:13 = int]
    main()::<lambda(const auto:12&, auto:13&)> [with auto:12 = double; auto:13 = double]
    main()::<lambda(auto:14& ...)> [with auto:14 = {int, double}]
    main()::<lambda(auto:14& ...)> [with auto:14 = {double, double}]
    main()::<lambda(auto:14& ...)> [with auto:14 = {long int, double}]
    main()::<lambda(auto:14& ...)> [with auto:14 = {float, double}]
    main()::<lambda(auto:14& ...)> [with auto:14 = {int, double, double}]
    main()::<lambda(auto:14& ...)> [with auto:14 = {double, double, double}]
    main()::<lambda(auto:14& ...)> [with auto:14 = {double, double, int}]
    main()::<lambda(auto:14& ...)> [with auto:14 = {double, double, double}]
    

    Proof Of Work

    Let's do some "useful" calculations that we can check. Also, renaming the transform function to nway_visit just reflect its more generic orientation:

    auto binary = [](auto& a, auto& b) { return a *= b; };
    auto n_ary  = [](auto&... fields) { return (... *= fields); };
    

    So both operations do a right-fold multiply-assigning. Given some distinctly chosen initializers

    S1 a {1,2,3,4};
    S2 b {2,3,4,5};
    S3 c {3,4};
    

    we want to be able to see the data flow through the data structures. So, let's optionally do some debug tracing of that:

    #define DEMO(expr)                                                             \
        void(expr);                                                                \
        if constexpr (output_enabled) {                                            \
            std::cout << "After " << std::left << std::setw(26) << #expr;          \
            std::cout << " a:" << pfr::io(a) << "\tb:" << pfr::io(b)               \
                      << "\tc:" << pfr::io(c) << "\n";                             \
        }
    
        DEMO("initialization");
    
        // all directions
        DEMO(nway_visit(binary, a, b));
        DEMO(nway_visit(binary, b, a));
    
        // mixed sizes
        DEMO(nway_visit(binary, b, c));
        DEMO(nway_visit(binary, c, a));
    
        // why settle for binary?
        DEMO(nway_visit(n_ary, a, b));
        DEMO(nway_visit(n_ary, a, b, c));
        DEMO(nway_visit(n_ary, c, b, a));
    
        return long(c.a + c.b) % 37; // prevent whole program optimization...
    

    As the cherry-on-top let's be absolutely sure that (with output disabled) the compiler cannot optimize the whole program away because there are no observable effects:

    return long(c.a + c.b) % 37; // prevent whole program optimization...
    

    The demo is Live On Compiler Explorer with output enabled, and once with output disabled showing the disassembly:

    main:
            mov     eax, 13
            ret
    

    WOW

    Holy smokes. That's optimization. The whole program is statically evaluated and just returns the exit code 13. Let's see whether that's the correct exit code:

    Output Enabled:

    After "initialization"           a:{1, 2, 3, 4} b:{2, 3, 4, 5}  c:{3, 4}
    After nway_visit(binary, a, b)   a:{2, 6, 12, 20}   b:{2, 3, 4, 5}  c:{3, 4}
    After nway_visit(binary, b, a)   a:{2, 6, 12, 20}   b:{4, 18, 48, 100}  c:{3, 4}
    After nway_visit(binary, b, c)   a:{2, 6, 12, 20}   b:{12, 72, 48, 100} c:{3, 4}
    After nway_visit(binary, c, a)   a:{2, 6, 12, 20}   b:{12, 72, 48, 100} c:{6, 24}
    After nway_visit(n_ary, a, b)    a:{24, 432, 576, 2000} b:{12, 72, 48, 100} c:{6, 24}
    After nway_visit(n_ary, a, b, c) a:{1728, 746496, 576, 2000}    b:{12, 72, 48, 100} c:{6, 24}
    After nway_visit(n_ary, c, b, a) a:{1728, 746496, 576, 2000}    b:{12, 72, 48, 100} c:{124416, 1289945088}
    

    So, the return value should be (124416 + 1289945088) modulo 37, which a pocket calculator confirms is 13.

    From Here: Parallel tasks etc.

    Your original motivation included getting the parallel execution options from standard library for free. As you know I'm skeptical of the usefulness of that.

    However, little stops you from getting this behaviour from thie algorithm:

    boost::asio::thread_pool ctx; // or, e.g. system_executor
    
    auto run_task = [&](auto&... fields) {
        boost::asio::post(ctx, [=] { long_running_task(fields...); });
    };
    

    Hope this is good inspiration. And thanks for making me look at PFR. It's pretty sweet.