Search code examples
c++boost

boost restrict twice after reading some data gave wrong result


The case is quite complicated to describe, I will just show some code snippet

ifstream ifs = xxx; // total length 1200 bytes
// do something that read 200 bytes from ifs;
filtering_istream limitedIn(restrict(ifs, 0/*off*/, 1000/*len*/));
// do something that read 700 bytes from limitedIn; we still got 300 bytes to read
filtering_istream limitedIn2(restrict(limitedIn, 0/*off*/, 300/*len*/)); 
// but now I can only read 100 bytes from limitedIn2 because when restrict limitedIn, it called limitedIn.seek(0, cur) which in turn calls ifs.seek(0, cur) which return 900 and updated limitedIn _pos to 900

Is there any simple way to avoid this problem? If I am able to get a stream wrapper around ifstream that returns 0 for seek even if ifstream return 200 thing will get a lot simpler.

working code example

    ofstream out(R"(C:\Users\nick\AppData\Local\Temp\bugTest)", ios_base::binary);
    char* content = new char[1200];
    out.write(content, 1200);
    out.flush();
    out.close();
    ifstream in(R"(C:\Users\nick\AppData\Local\Temp\bugTest)", ios_base::binary);
    in.read(content, 200);
    filtering_istream limitedIn(restrict(in, 0, 1000));
    limitedIn.read(content, 700);
    filtering_istream limitedIn2(restrict(limitedIn, 0, 300));
    std::cout << limitedIn2.rdbuf()->sgetn(content, 300); // print 100
    delete []content;

Solution

  • I created a more verbose self-contained example that shows better what is happening:

    Live On Godbolt

    #include <boost/iostreams/filtering_stream.hpp>
    #include <boost/iostreams/restrict.hpp>
    #include <fstream>
    #include <fmt/ranges.h>
    #include <span>
    
    int main() {
        constexpr int capacity = 12;
        std::array<char, capacity> content{'a','b','c','d','e','f','g','h','i','j','k','l'};
        {
            std::ofstream out(R"(bugTest)", std::ios::binary);
            out.write(content.data(), content.size());
            out.flush();
        }
        using boost::iostreams::restrict;
        using boost::iostreams::filtering_istream;
    
        auto read_n = [](auto&& is, size_t n) {
            std::array<char, capacity> content{0};
            bool ok(is.read(content.data(), n));
            fmt::print("{} read {} bytes out of {}: {}\n",
                    ok?"successful":"failed",
                    is.gcount(), n,
                    std::span(content.data(), is.gcount()));
        };
    
        std::ifstream in("bugTest", std::ios::binary);
        read_n(in, 2);
    
        filtering_istream limitedIn(restrict(in, 0, 10));
        read_n(limitedIn, 7);
        read_n(filtering_istream(restrict(limitedIn, 0, 3)), 3);
    }
    

    Prints

    successful read 2 bytes out of 2: {'a', 'b'}
    successful read 7 bytes out of 7: {'c', 'd', 'e', 'f', 'g', 'h', 'i'}
    failed read 1 bytes out of 3: {'j'}
    

    Do Devices Behave "Better"?

    The conceptual problem seems to be repeated restriction of a stream. Perhaps you can use a device:

    Live On Godbolt

    #include <boost/iostreams/filtering_stream.hpp>
    #include <boost/iostreams/device/file.hpp>
    #include <boost/iostreams/restrict.hpp>
    #include <fstream>
    #include <fmt/ranges.h>
    #include <span>
    
    int main() {
        constexpr int capacity = 12;
        std::array<char, capacity> content{'a','b','c','d','e','f','g','h','i','j','k','l'};
        {
            std::ofstream out(R"(bugTest)", std::ios::binary);
            out.write(content.data(), content.size());
            out.flush();
        }
        using boost::iostreams::restrict;
        using boost::iostreams::filtering_istream;
        using boost::iostreams::is_device;
    
        auto read_n = [](auto&& str_or_dev, size_t n) {
            std::array<char, capacity> content{0};
            bool ok = true;
            size_t count = 0;
            if constexpr (is_device<std::decay_t<decltype(str_or_dev)>>::value) {
                if (auto actual = str_or_dev.read(content.data(), n); actual != -1) {
                    ok = true;
                    count = actual;
                } else {
                    ok = false;
                    count = 0;
                }
            } else {
                ok = str_or_dev.good();
                count = str_or_dev.gcount();
            }
    
            fmt::print("{} read {} bytes out of {}: {}\n",
                    ok?"successful":"failed", count, n,
                    std::span(content.data(), count));
        };
    
        boost::iostreams::file_source in("bugTest", std::ios::binary);
        read_n(in, 2);
    
        auto limitedIn(restrict(in, 0, 10));
        read_n(limitedIn, 7);
        read_n(restrict(limitedIn, 0, 3), 3);
    }
    

    It does

    • simplify the code
    • probably increase performance
    • make a difference (in that the final read is deemed "succesful")
    • make no difference (in the sense that the window-calculation anomaly is still present)

    What Now

    By now, it's looking like we're in plain-and-simple library bug/undocumented limitation territory.

    I'll have you know that I consider the output surprising anyways, I'd expect the second read to start from 'a' again, especially in the Device formulation. That's because offset = 0 implies to me an absolute seek.

    Let's detect the actual stream position relative to the restriction at each read:

    auto pos = [](auto&& str_or_dev) {
        return str_or_dev.seek(0, std::ios::cur);
    };
    

    And then:

    auto read_n = [pos](auto&& str_or_dev, size_t n) {
        fmt::print("read at #: {}\n", pos(str_or_dev));
        // ...
    

    This leads to more insight! It prints

    read at #: 0    
    successful read 2 bytes out of 2: {'a', 'b'}    
    read at #: 2    
    successful read 7 bytes out of 7: {'c', 'd', 'e', 'f', 'g', 'h', 'i'}    
    read at #: 9    
    failed read 0 bytes out of 3: {}
    

    Wow. We learn that:

    • indeed, we don't start at "9" even relative to the restriction. So my expectation that the second read should logically commence at 'a' seems to hold
    • just observing the position prior to read leads the last read to return -1, which in a way makes more sense than returning 1 character

    Indeed, looking at the relevant constructor, we can deduce that restriction makes the hard assumption that the underlying source is at origin:

    template<typename Device>
    restricted_indirect_device<Device>::restricted_indirect_device
        (param_type dev, stream_offset off, stream_offset len)
        : device_adapter<Device>(dev), beg_(off), pos_(off), 
          end_(len != -1 ? off + len : -1)
    {
        if (len < -1 || off < 0)
            boost::throw_exception(BOOST_IOSTREAMS_FAILURE("bad offset"));
        iostreams::skip(this->component(), off);
    }
    

    The skip is unconditional and doesn't observe the stream-position of dev. Now, in your case you'd think it doesn't matter because off is always 0, but the stream position is not.

    A Counter Example:

    Seeking the underlying device to the start before every read does remove all unexpected effects:

    in.seek(0, std::ios::beg);
    read_n(in, 2);
    
    auto limitedIn(restrict(in, 0, 10));
    
    in.seek(0, std::ios::beg);
    read_n(limitedIn, 7);
    
    in.seek(0, std::ios::beg);
    read_n(restrict(limitedIn, 0, 3), 3);
    

    Prints:

    read at #: 0    
    successful read 2 bytes out of 2: {'a', 'b'}
    read at #: 0    
    successful read 7 bytes out of 7: {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
    read at #: 0
    successful read 3 bytes out of 3: {'a', 'b', 'c'}
    

    Of course that's not what you want but it helps to understand what it does, so you know what to say to get what we do want.

    Now, by instead seeking to the restricted begin we're going to home in on the problem: https://godbolt.org/z/r11zo8. Now the last read_n triggers a bad_seek, but, surprisingly, in the second line only:

    str_or_dev.seek(0, std::ios::beg);
    str_or_dev.seek(0, std::ios::cur);
    

    What's even worse, seeking to start of the restriction twice suddenly gets the same behaviour as seeking the underlying device to 0?!

    str_or_dev.seek(0, std::ios::beg);
    str_or_dev.seek(0, std::ios::beg);
    

    See it Live On Godbolt

    successful read 2 bytes out of 2: {'a', 'b'}
    successful read 7 bytes out of 7: {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
    successful read 3 bytes out of 3: {'a', 'b', 'c'}
    

    Bug Report Time

    Sadly this means it's probably time to file a bug report. Of course you can avoid the use of nested restrictions yourself, but this might not be as easy to do in your actual code base (which is likely to include generic streams).

    There are a number of quick and dirty hacks that can make the restriction "realize" its position from the underlying component (as in the above epiphany where we found that "just observing the position prior to read leads the last read to return -1".

    I'm not going to suggest any of those, because they, too, would break down if you had more than 2 levels of restrictions - perhaps hidden by some more adaptive layers.

    This is a thing that needs to be fixed in the library. At the very least the documentation might state that the underlying component is assumed to be at start position on creation of the restricted device or filter.