Search code examples
c++iteratorc++20std-rangesrange-v3

Ranges filter_view::iterator element modification results in UB


The initial question was why with the following code,

std::vector<int> coll{1,4,7,10};

auto iseven = [](auto&& i){return i % 2 == 0; };
auto  colleven = coll | std::views::filter(iseven);

// first view materialization
for(int& i : colleven)
{
    i += 1;
}
for(auto i : coll)
std::cout << i << ' ';
std::cout << std::endl;

// second view materialization
for(int& i : colleven)
{
    i += 1;
}
for(auto i : coll)
std::cout << i << ' ';

and by materializing the view twice, we get two different results. This is definately weird at first sight. The output:

1 5 7 11 
1 6 7 11 

After doing some research and looking into potential duplicates I understand that this is the cause of Undefined Behavior per https://eel.is/c++draft/range.filter#iterator-1.

Basically, std::filter_view::iterator - and other similar views - caches the begin iterator (filter_view is derived from remove_if_view) in order to achieve "laziness" resulting in it maintaining internal state. In the specific example the standard dictates that "even after modification of the view element the user should take care that the predicate remains true." So my question now becomes:

Isn't that a weird requirement? Asking from the user not to do something that otherwise would feel natural, materializing a filter view twice, that is. What are the compromises that we would have to make in order to alleviate this restriction and why didn't we make them?

Note: My question regards standard views and I know the code I linked is from range-v3. I presume the reference implementation corresponds to the standard in this case.


Solution

  • Update: I wrote a much longer version of this answer as a blog post: Mutating through a filter.


    Isn't that a weird requirement? Asking from the user not to do something that otherwise would feel natural [...]

    I don't think so. I think the code in the example is actually exceedingly weird to begin with, and it's not especially surprising that it doesn't work.

    Views are intended to be ephemeral. You construct the view that you want, you use it, and then you throw it away. The view is (probably) going to have its own reference dependencies, and you should not touch them for the lifetime of the view. In Rust's terms, the view is borrowing the containers that it's constructed from.

    With this in mind, it makes no sense to construct a filter, do something with it, then mutate the underlying container, then re-use the original filter. Just construct a new one.

    What are the compromises that we would have to make in order to alleviate this restriction and why didn't we make them?

    Uh, none. This restriction is fairly fundamental to even the iterator model and has nothing really to do with caching or any range-specific design choices.

    The model of forward iterators is that if you copy a forward iterator, and then advance both, that both copies are valid and refer to the same element (assuming that they weren't originally end() so that advancing is actually valid). That remains true of filter as well:

    vector<int> v = {1, 2, 3, 4};
    auto f = v | views::filter(iseven);
    auto it = f.begin(); // this is the 2
    auto it2 = it;       // this is also the 2
    ++it;                // this is the 4
    *it = 5;
    ++it2;               // oops: this is v.end()
    assert(it == it2);   // nope
    

    That assertion holding is an important part of the C++ iterator model, and it cannot possibly hold if you allow arbitrary mutation to occur.

    Now the original iteration in the example:

    for (int& i : colleven) {
        i += 1;
    }
    

    This does mutation, of the kind that breaks guarantees. But this is kind of OK - we're mutating, but we're mutating in a way that happens to not have any ill-effects in this context. Reusing colleven after this is definitely not okay (because of the mutation breaking iterator guarantees). It's very difficult to actually articulate exactly what situations lead to undefined behavior.

    But the fact that looping over colleven twice after internal mutation doesn't work in C++20 ranges isn't just specifically a consequence of caching begin() - it's a consequence of the fact that you just cannot allow doing this sort of thing and maintain any iterator guarantees. It's not a weird requirement - the code itself is problematic. It's just problematic in a way that is impossible to diagnose in C++.

    The short version is: views are not intended to be long-lived, so don't use them that way.