Search code examples
c++iterablestd-ranges

Issue with a custom range when using both std::views::join and std::views::enumerate


I have a simple SequencesRange class that can iterate random genomic sequences of fixed size.

When I try to join the characters of all sequences with std::views::join, it works well, but as soon I try to pipe with std::views::enumerate, I get garbage output. I can check that valgrind is not very happy and produces some dreaded Conditional jump or move depends on uninitialised value(s)

#include <string>
#include <iostream>
#include <ranges>

class SequencesRange
{
public:

    struct iterator 
    {
        using iterator_category = std::forward_iterator_tag;
        using value_type        = std::string;
        using difference_type   = long;
        using pointer           = value_type*;
        using reference         = value_type const&;

        iterator (size_t size, size_t nbItems)  : value_(size,' '), nbItems_(nbItems)
        {
            next();
        }

        iterator()                = default;
        iterator(iterator const&) = default;
        iterator(iterator &&    ) = default;

        iterator& operator= (iterator const&) = default;
        iterator& operator= (iterator     &&) = default;

        bool operator!= (const iterator& other) const { return nbItems_ != other.nbItems_; }
        bool operator== (const iterator& other) const { return nbItems_ == other.nbItems_; }

        iterator& operator++ ()    { next();  ++nbItems_;  return *this; }
        iterator  operator++(int)  { iterator tmp = *this;  ++(*this);  return tmp; }

        reference operator* () const { return value_; }

        value_type value_;
        size_t     nbItems_ = 0;
        
        void next()
        {
            static std::array<char,4> letters = {'A','C','G','T'};
            
            for (auto& c : value_)  { c = letters[rand()%letters.size()]; }
        }
    };

public:

    iterator begin() const { return iterator(size,       0); }
    iterator end  () const { return iterator(size, nbItems); }
    
    size_t nbItems = 0;
    size_t size    = 0;
};

int main (int argc, char** argv)
{
    SequencesRange iterable {1, 10};  // one sequence of length 10
 
    for (auto [i,nt] : iterable | std::views::join | std::views::enumerate)
    {
        std::cout << nt;  // got some garbage output here
    }

    // Note that the following is ok:
    //  for (auto nt : iterable | std::views::join)  { std::cout << nt; }
}

It sounds like bad object life cycle management and I suspect something silly in my iterator structure but I currently can't see where.

Odd thing: if I configure the iterable with sequences of length>=16, I don't have any error. Maybe a side effect of short string optimization ?

DEMO


UPDATE:

Note that I have no issue with a simplified version of the iterator structure using an integer sentinel for the end instance:

DEMO


UPDATE:

If I force the iterator to provide copies (ie. value_type operator* () const { return value_; }), it works.

But I can't see why copies are needed for iterable | std::views::join | std::views::enumerate although only references can be used with iterable | std::views::join.


Solution

  • Your iterator is in fact a stashing iterator, i.e. it returns references to its own members, which means that its reference is tied to the lifetime of the iterator itself.

    According to [iterator.concept.forward]:

    Pointers and references obtained from a forward iterator into a range [i, s) shall remain valid while [i, s) continues to denote a range.

    and [forward.iterators]

    If a and b are both dereferenceable, then a == b if and only if *a and *b are bound to the same object.

    A stashing iterator is never a forward iterator.

    Correctly define your iterator's iterator_category as the input_iterator_tag will make the range adaptor work properly.