Search code examples
c++c++17c++20range-v3c++23

How does ranges::any_view work and how is it behaving in conjunction to cache1 in these cases?


tl;dr

It's not clear to me how ranges::any_view<T, C> works, especially in relation to what T is, i.e. a &, a const&, a value type, or what else.¹

Research

There isn't really much about it here, yet, nor I have found much around, beside this very useful explanation of why ranges::begin can't be called on const ranges::any_view.

A tricky (?) case

Here's one such example which puzzles me a lot, and below are the relavants bit of it. I think that thoroughly understanding why the examples linked below work (or don't) the way they do, should help me understand what any_view actually is.

This function accepts any range of std::strings and prints each of them (the filter(always(true)) is letting everything pass):

void test(any_view<std::string> parts) {
    for_each(parts | filter(always(true)), print);
};

Now, if I pass an lvalue of type any_view<std::string const&>

any_view<std::string const&> view{foos};

into it,

test(view);

all is fine, but if I first pipe that into cache1²,

test(view | cache1);

then the all the std::strings result empty.

By trial and error I realized that changing

test(any_view<std::string> parts) {

to

test(any_view<std::string const&> parts) {

solves the problem, but I don't really understand why.

But if I then was to also change

any_view<std::string const&> view{foos};

to

any_view<std::string> view{foos};

then I'd get a SIGSEGV.

Furthermore, manually inlining the body of test at the call site, also seems to solve the issue.

And to make the thing trickier, when I try to change std::string for a Foo that holds a std::string (and sets it to "" in the destructor, to hopefully make possible UB a bit more "observable"), the issue seems to go away.

Side question

I know there's no any_view in , and I guess there won't be one in either, but is there a reason (beside performance) why it's not been included yet or won't be included in the foreseeable future? I mean, to me it looks like it does serve a ligitimate purpose, so what else in the standard library is or will fill that hole?


(¹) It's a bit clearer what the role of C is, especially because I had an experience of downgrading an any_view<SomeType, sized> to any_view<SomeType, forward> for the purpose of allowing storing a filter_view, which is not sized, into it.

(²) Yes, the cache1 is useless here, but in my real usecase I had view | transform(f) | filter(g), so I inserted cache1 in between to prevent f from being called twice for each entry x in view which passes the test g(f(x)).


Solution

  • This is a combination of a few things:

    1. For any_view<T>, the reference type of the range is T. any_view<string> is a range of prvalue string, while any_view<string const&> is a range of lvalues of string const.
    2. any_view<T> accepts any range who reference is convertible to T, not same as T. So you can construct an any_view<string> from a range whose reference type isn't just string or even char const* but also, relevantly here, string&&.
    3. The way cache1 is implemented in range-v3 is probably not the best way to do this, and looks like:
      range_value_t<Rng>&& read() const {
          if (parent_->dirty_) { /* update cache */ }
          return std::move(*parent_->cache_);
      }
      
    4. filter requires dereferencing iterators more than once (once to see if it passes the filter, and then the second time for when you actually use the value).

    So when you do this:

    vector<string> foos = {...};
    any_view<string> parts = foos | cache1;
    

    foos | cache1 is a range of string&& (which is an xvalue into cache1's internal cache, foos is unaffected - it could've been const too). We're converting that to any_view<string>, which is now a range of string, which means that somewhere in all these types we have an iterator whose operator*() returns a string from the result of dereferencing (foos | cache1)'s iterator. That will move from cache1's cache. So if you dereference twice, you will end up with an empty cache.

    Then when you do parts | views::filter(pred), because you dereference every iterator twice (once to check if it satisfies the predicate, the second to print it), you will end up moving from the cached string right before you print it, so you only get empty strings.


    The problem here is probably that cache1 is a range of range_value_t<V>&&, which can lend itself to this double-dereference issue. If it were a range of range_value_t<V>& instead, then this would just work (at the cost of every dereference of your any_view<string> leading to a string copy). On the other hand, if you won't dereference the iterator more than once, you do want to just move, since if you were doing something like:

    some | pipeline | cache1 | to<std::vector>();
    

    You don't want to have to remember to write:

    some | pipeline | cache1 | as_rvalue | to<std::vector>();
    

    Note that a similar problem exists for views::concat(r1, r2) if r1 is a range of string and r2 is a range of string&&, you will end up with a range whose reference type is string, such that dereferencing that iterator will end up moving from the elements of r2. Which is what you want to happen if you're doing views::concat(r1, r2) | to<vector>(), but is definitely want you don't want to happen if you're doing views::concat(r1, r2) | filter(p) | ...

    It's hard to know what the right answer is.