concatenating any number of std::arrays, using only the bare minimum of ctor calls

I want to implement this function (a function which can concatenate any number of arrays, and it considers the value category as well):

template <typename ELEMENT, std::size_t ...SIZE>
constexpr auto concat(std::array<ELEMENT, SIZE> &&&...arrays) {
    return std::array<ELEMENT, SUMMED_SIZE>{ CONCATENATED_ARRAY_ELEMENTS };
}

Notes:

By &&&, I mean that each array in arrays can be rvalue/lvalue ref (can be mixed)
By SUMMED_SIZE, I mean the the sum of SIZE (this is trivial)
By CONCATENATED_ARRAY_ELEMENTS, I mean all the elements of each array of arrays concatenated. If an array in arrays is an rvalue/lvalue, then the corresponding elements in CONCATENATED_ARRAY_ELEMENTS should also be rvalues/lvalues.
I don't want to have extra move/copy ctor calls (otherwise it would be trivial to implement this by using recursion), I want to have only one move/copy for each element in arrays.
ELEMENT may not be default-constructible, so elements must be constructed by copy/move ctor.

Is this possible to do?

It is not hard to implement this function for two arrays (godbolt), but I'm not sure how to generalize this to more than two arrays. I could write 3/4/5/etc-argument function manually, but the number of needed functions grows exponentially.

Solution

The ask here is basically a simplified version of std::tuple_cat. It's a lot easier than the general problem since you're accepting just std::array<T, N1>, std::array<T, N2>, ..., std::array<T, Nx> and returning std::array<T, N1 + N2 + ... + Nx>. So computing the return type is trivial (and also unnecessary).

But the actual indexing logic is the same. We need to produce two lists of numbers: a list of outer indices and a list of inner indices. The goal is to ultimately produce the expression:

return std::array{ arrays...[Inner][Outer]... };

Where, e.g., if we're concatenating an array of size 4 and an array of size 2, we'd want something like:

Outer is the pack {0, 1, 2, 3, 0, 1}
Inner is the pack {0, 0, 0, 0, 1, 1}

Which we can do, with Boost.Mp11, like this:

template<typename... Arrays>
constexpr auto concat(Arrays&& ... arrays) {
  // turns (array<int, 4>, array<int, 2>) [4, 2]
  using Sizes = mp_list<mp_int<std::tuple_size_v<std::remove_cvref_t<Arrays>>>...>;

  // the inner transform gives us [[0, 0, 0, 0], [1, 1]]
  // so we flatten it with mp_append
  using Inner = mp_apply<mp_append,
    mp_transform<mp_repeat, mp_transform<mp_list, mp_iota_c<sizeof...(Arrays)>>, Sizes>
    >;

  // the inner lists give us [[0, 1, 2, 3], [0, 1]]
  // so we flatten it with mp_append
  using Outer = mp_apply<mp_append,
    mp_list<mp_iota_c<std::tuple_size_v<std::remove_cvref_t<Arrays>>>...>
    >;

  return [&]<class... Is, class... Js>(mp_list<Is...>, mp_list<Js...>){
    return std::array{
        std::forward_like<decltype(arrays...[Is::value])>(
            arrays...[Is::value][Js::value]
        )...
    };
  }(Inner{}, Outer{});

The inner construction here is pretty awkward because it's not enough to do FWD(arr)[i][j] - because array's index operator doesn't propagate value category. So we need to use std::forward_like, which needs different expressions for the type and operand - which makes it inconvenient to macro out. I suppose you could define a function like std::array{index(FWD(arrays...[Is::value]), Js::value)...} but not sure it's worth it. Maybe if you do this in several places.

But this correctly moves or copies as desired.

Demo.