Search code examples
c++iostreamistreamistream-iterator

Confused about usage of `std::istreambuf_iterator`


I've implemented a deserialization routine for an object using the << stream operator. The routine itself uses an istreambuf_iterator<char> to extract characters from the stream one by one, in order to construct the object.

Ultimately, my goal is to be able to iterate over a stream using an istream_iterator<MyObject> and insert each object into a vector. Pretty standard, except I'm having trouble getting the istream_iterator to stop iterating when it hits end-of-stream. Right now, it just loops forever, even though calls to istream::tellg() indicate I'm at the end of the file.

Here's code to reproduce the problem:

struct Foo
{
    Foo() { }    
    Foo(char a_, char b_) : a(a_), b(b_) { }

    char a;
    char b;
};

// Output stream operator
std::ostream& operator << (std::ostream& os, const Foo& f)
{
    os << f.a << f.b;
    return os;
}

// Input stream operator
std::istream& operator >> (std::istream& is, Foo& f)
{
    if (is.good()) 
    {
        std::istreambuf_iterator<char> it(is);
        std::istreambuf_iterator<char> end;

        if (it != end) {
            f.a = *it++;
            f.b = *it++;
        }
    }
    return is;
}

int main()
{
    {
        std::ofstream ofs("foo.txt");
        ofs << Foo('a', 'b') << Foo('c', 'd');
    }

    std::ifstream ifs("foo.txt");
    std::istream_iterator<Foo> it(ifs);
    std::istream_iterator<Foo> end;
    for (; it != end; ++it) cout << *it << endl; // iterates infinitely
}

I know in this trivial example I don't even need istreambuf_iterator, but I'm just trying to simplify the problem so it's more likely people will answer my question.

So the problem here is that even though the istreambuf_iterator reaches the end of the stream buffer, the actual stream itself doesn't enter an EOF state. Calls to istream::eof() return false, even though istream::tellg() returns the last byte in the file, and istreambuf_iterator<char>(ifs) compares true to istreambuf_iterator<char>(), meaning I'm definitely at the end of the stream.

I looked at the IOstreams library code to see exactly how it's determining whether an istream_iterator is at the end position, and basically it checks if istream::operator void*() const evaluates to true. This istream library function simply returns:

return this->fail() ? 0 : const_cast<basic_ios*>(this);

In other words, it returns 0 (false) if the failbit is set. It then compares this value to the same value in a default-constructed instance of istream_iterator to determine if we're at the end.

So I tried manually setting the failbit in my std::istream& operator >> (std::istream& is, Foo& f) routine when the istreambuf_iterator compares true to the end iterator. This worked perfectly, and properly terminated the loop. But now I'm really confused. It seems that istream_iterator definitely checks for std::ios::failbit in order to signify an "end-of-stream" condition. But isn't that what std::ios::eofbit is for? I thought failbit was for error conditions, like for example if the underlying file of an fstream couldn't be opened or something.

So, why do I need to call istream::setstate(std::ios::failbit) to get the loop to terminate?


Solution

  • When you use istreambuf_iterator, you are manipulating the underlying streambuf object of the istream object. The streambuf object doesn't know anything about it's owner(the istream object), so calling functions on the streambuf object does not make changes to the istream object. That's why the flags in the istream object are not set when you reach the eof.

    Do something like this:

    std::istream& operator >> (std::istream& is, Foo& f)
    {
        is.read(&f.a, sizeof(f.a));
        is.read(&f.b, sizeof(f.b));
        return is;
    }
    

    Edit

    I was stepping through the code in my debugger, and this is what I found. istream_iterator has two internal data members. A pointer to the associated istream object, and an object of the template type (Foo in this case). When you call ++it, it calls this function:

    void _Getval()
    {    // get a _Ty value if possible
        if (_Myistr != 0 && !(*_Myistr >> _Myval))
            _Myistr = 0;
    }
    

    _Myistr is the istream pointer, and _Myval is the Foo object. If you look here:

    !(*_Myistr >> _Myval)
    

    That's where it calls your operator>> overload. And it calls operator! on the returned istream object. And as you can see here, operator! only returns true if failbit or badbit are set, eofbit doesn't do it.

    So, what happens next, if either failbit or badbit are set, the istream pointer gets NULL'd. And the next time you compare the iterator to the end iterator, it compares the istream pointer, which is NULL on both of them.