Search code examples
c++c++20std-ranges

Do C++20 ranges support group by functionality?


Sometimes it is quite useful to group/partition the objects based on value of one of their member functions(either getter or some computation).

Do C++20 ranges enable something like

std::vector<Person> {{.Age=23, .Name = "Alice"}, {.Age=25, .Name = "Bob"}, {.Age=23, .Name = "Chad"}};
// group by .Age and put into std::map
std::map<int/*Age is int*/, std::vector<Person>> AgeToPerson = ...;
// 23 -> Person{23,Alice}, Person{23,Chad}
// 25 -> Person{25,Bob}

Note 1: there is this old question where accepted answer is to just use raw for loop

Note 2: range-v3 has this confusing group_by algorithm that seems useless for my task:

Given a source range and a binary predicate, return a range of ranges where each range contains contiguous elements from the source range such that the following condition holds: for each element in the range apart from the first, when that element and the first element are passed to the binary predicate, the result is true. In essence, views::group_by groups contiguous elements together with a binary predicate.


Solution

  • There are really three different kinds of functionalities that languages offer under the name group by:

    1. Take a binary predicate ((T, T) -> bool) and group consecutive elements for which that predicate evaluates to true (e.g. Haskell, Elixir, D, range-v3 kind of)
    2. Take a unary function (T -> U) and group consecutive elements with the same "key" and yield a range of pairs of U, [T] (e.g. Rust, Python, also D, F#)
    3. Take a unary function (T -> U) and return a dictionary that maps U: [T] (e.g. Clojure, Kotlin, Scala).

    The first two require consecutive elements - meaning you need to sort by key. The last one doesn't, since you're producing a container anyway. You could also produce the 3rd version from the 2nd, even without sorting, although that would still require a loop.

    But as mentioned, range-v3 only offers the 1st, and that one isn't even in C++20. So you need to write your own thing. In this case, a loop is probably best:

    template <range R, indirectly_unary_invocable<iterator_t<R>> F>
        /* other requirements such that you can form a map */
    auto group_by_into_map(R&& range, F&& f)
    {
        unordered_map<
            decay_t<indirect_result_t<F&, iterator_t<R>>>, // result of unary function
            vector<range_value_t<R>>                       // range-as-vector
        > map;
    
        for (auto&& e : range) {
            map[std::invoke(f, e)].push_back(e);
        }
    
        return map;
    }
    

    Something to that effect. This allows:

    group_by_into_map(people, &Person::Age);
    

    Unless you're ok with using std::unordered_multimap. Do people use that? It's kind of a weird container. But assuming you are, then this is much easier. You can write your own adapter:

    template <typename F> // NB: must be unconstrained
    auto group_by_into_map(F&& f) {
        return views::transform([=](auto&& e){ return std::pair(std::invoke(f, e), e); })
             | ranges::to<std::unordered_multimap>();
            
    }
    

    which allows for:

    people | group_by_into_map(&Person::Age);
    

    But this gives you an unordered_multimap<int, Person> rather than an unordered_map<int, vector<Person>>.