The case is quite complicated to describe, I will just show some code snippet
ifstream ifs = xxx; // total length 1200 bytes
// do something that read 200 bytes from ifs;
filtering_istream limitedIn(restrict(ifs, 0/*off*/, 1000/*len*/));
// do something that read 700 bytes from limitedIn; we still got 300 bytes to read
filtering_istream limitedIn2(restrict(limitedIn, 0/*off*/, 300/*len*/));
// but now I can only read 100 bytes from limitedIn2 because when restrict limitedIn, it called limitedIn.seek(0, cur) which in turn calls ifs.seek(0, cur) which return 900 and updated limitedIn _pos to 900
Is there any simple way to avoid this problem? If I am able to get a stream wrapper around ifstream that returns 0 for seek
even if ifstream return 200 thing will get a lot simpler.
ofstream out(R"(C:\Users\nick\AppData\Local\Temp\bugTest)", ios_base::binary);
char* content = new char[1200];
out.write(content, 1200);
out.flush();
out.close();
ifstream in(R"(C:\Users\nick\AppData\Local\Temp\bugTest)", ios_base::binary);
in.read(content, 200);
filtering_istream limitedIn(restrict(in, 0, 1000));
limitedIn.read(content, 700);
filtering_istream limitedIn2(restrict(limitedIn, 0, 300));
std::cout << limitedIn2.rdbuf()->sgetn(content, 300); // print 100
delete []content;
I created a more verbose self-contained example that shows better what is happening:
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/restrict.hpp>
#include <fstream>
#include <fmt/ranges.h>
#include <span>
int main() {
constexpr int capacity = 12;
std::array<char, capacity> content{'a','b','c','d','e','f','g','h','i','j','k','l'};
{
std::ofstream out(R"(bugTest)", std::ios::binary);
out.write(content.data(), content.size());
out.flush();
}
using boost::iostreams::restrict;
using boost::iostreams::filtering_istream;
auto read_n = [](auto&& is, size_t n) {
std::array<char, capacity> content{0};
bool ok(is.read(content.data(), n));
fmt::print("{} read {} bytes out of {}: {}\n",
ok?"successful":"failed",
is.gcount(), n,
std::span(content.data(), is.gcount()));
};
std::ifstream in("bugTest", std::ios::binary);
read_n(in, 2);
filtering_istream limitedIn(restrict(in, 0, 10));
read_n(limitedIn, 7);
read_n(filtering_istream(restrict(limitedIn, 0, 3)), 3);
}
Prints
successful read 2 bytes out of 2: {'a', 'b'}
successful read 7 bytes out of 7: {'c', 'd', 'e', 'f', 'g', 'h', 'i'}
failed read 1 bytes out of 3: {'j'}
The conceptual problem seems to be repeated restriction of a stream. Perhaps you can use a device:
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/device/file.hpp>
#include <boost/iostreams/restrict.hpp>
#include <fstream>
#include <fmt/ranges.h>
#include <span>
int main() {
constexpr int capacity = 12;
std::array<char, capacity> content{'a','b','c','d','e','f','g','h','i','j','k','l'};
{
std::ofstream out(R"(bugTest)", std::ios::binary);
out.write(content.data(), content.size());
out.flush();
}
using boost::iostreams::restrict;
using boost::iostreams::filtering_istream;
using boost::iostreams::is_device;
auto read_n = [](auto&& str_or_dev, size_t n) {
std::array<char, capacity> content{0};
bool ok = true;
size_t count = 0;
if constexpr (is_device<std::decay_t<decltype(str_or_dev)>>::value) {
if (auto actual = str_or_dev.read(content.data(), n); actual != -1) {
ok = true;
count = actual;
} else {
ok = false;
count = 0;
}
} else {
ok = str_or_dev.good();
count = str_or_dev.gcount();
}
fmt::print("{} read {} bytes out of {}: {}\n",
ok?"successful":"failed", count, n,
std::span(content.data(), count));
};
boost::iostreams::file_source in("bugTest", std::ios::binary);
read_n(in, 2);
auto limitedIn(restrict(in, 0, 10));
read_n(limitedIn, 7);
read_n(restrict(limitedIn, 0, 3), 3);
}
It does
By now, it's looking like we're in plain-and-simple library bug/undocumented limitation territory.
I'll have you know that I consider the output surprising anyways, I'd expect the second read to start from 'a'
again, especially in the Device formulation. That's because offset = 0
implies to me an absolute seek.
Let's detect the actual stream position relative to the restriction at each read:
auto pos = [](auto&& str_or_dev) {
return str_or_dev.seek(0, std::ios::cur);
};
And then:
auto read_n = [pos](auto&& str_or_dev, size_t n) {
fmt::print("read at #: {}\n", pos(str_or_dev));
// ...
This leads to more insight! It prints
read at #: 0
successful read 2 bytes out of 2: {'a', 'b'}
read at #: 2
successful read 7 bytes out of 7: {'c', 'd', 'e', 'f', 'g', 'h', 'i'}
read at #: 9
failed read 0 bytes out of 3: {}
Wow. We learn that:
'a'
seems to hold-1
, which in a way makes more sense than returning 1 characterIndeed, looking at the relevant constructor, we can deduce that restriction
makes the hard assumption that the underlying source is at origin:
template<typename Device>
restricted_indirect_device<Device>::restricted_indirect_device
(param_type dev, stream_offset off, stream_offset len)
: device_adapter<Device>(dev), beg_(off), pos_(off),
end_(len != -1 ? off + len : -1)
{
if (len < -1 || off < 0)
boost::throw_exception(BOOST_IOSTREAMS_FAILURE("bad offset"));
iostreams::skip(this->component(), off);
}
The skip
is unconditional and doesn't observe the stream-position of dev
. Now, in your case you'd think it doesn't matter because off
is always 0, but the stream position is not.
Seeking the underlying device to the start before every read does remove all unexpected effects:
in.seek(0, std::ios::beg);
read_n(in, 2);
auto limitedIn(restrict(in, 0, 10));
in.seek(0, std::ios::beg);
read_n(limitedIn, 7);
in.seek(0, std::ios::beg);
read_n(restrict(limitedIn, 0, 3), 3);
read at #: 0
successful read 2 bytes out of 2: {'a', 'b'}
read at #: 0
successful read 7 bytes out of 7: {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
read at #: 0
successful read 3 bytes out of 3: {'a', 'b', 'c'}
Of course that's not what you want but it helps to understand what it does, so you know what to say to get what we do want.
Now, by instead seeking to the restricted begin we're going to home in on the problem: https://godbolt.org/z/r11zo8. Now the last read_n
triggers a bad_seek
, but, surprisingly, in the second line only:
str_or_dev.seek(0, std::ios::beg);
str_or_dev.seek(0, std::ios::cur);
What's even worse, seeking to start of the restriction twice suddenly gets the same behaviour as seeking the underlying device to 0?!
str_or_dev.seek(0, std::ios::beg);
str_or_dev.seek(0, std::ios::beg);
See it Live On Godbolt
successful read 2 bytes out of 2: {'a', 'b'}
successful read 7 bytes out of 7: {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
successful read 3 bytes out of 3: {'a', 'b', 'c'}
Sadly this means it's probably time to file a bug report. Of course you can avoid the use of nested restrictions yourself, but this might not be as easy to do in your actual code base (which is likely to include generic streams).
There are a number of quick and dirty hacks that can make the restriction "realize" its position from the underlying component (as in the above epiphany where we found that "just observing the position prior to read leads the last read to return -1
".
I'm not going to suggest any of those, because they, too, would break down if you had more than 2 levels of restrictions - perhaps hidden by some more adaptive layers.
This is a thing that needs to be fixed in the library. At the very least the documentation might state that the underlying component is assumed to be at start position on creation of the restricted device or filter.