Is it possible to read decompressed file once again?
Let imagine I used archive_read_next_header(a, &entry)
,
and I read an unknown number of bytes using archive_read_data(a, ptr_to_buffer, buffer_size)
. Right now I want to reset it and start reading again from the beginning. I trying to override seekoff(std::streamoff off, std::ios_base::seekdir way, std::ios_base::openmode which)
. I understand that might be impossible to just seek inside decompressed data because of inner work of compression algorithms, and data is not stored anywhere except a limited number of bytes in libarchive internal buffer.
The idea is to just reset it all, and read std::streamoff off
bytes, that way I could create backward seek. Forward seek would be easy, just read std::streamoff off
bytes. It's really inefficient, but let's hope, seek won't be used much.
Whole structure archive
was initialized that way:
archive_read_set_read_callback(a, read_callback);
archive_read_set_callback_data(a, container);
archive_read_set_seek_callback(a, seek_callback);
archive_read_set_skip_callback(a, skip_callback);
int r = (archive_read_open1(a));
where container contains most of all std::istream
, and callbacks are functions which manipulate that stream.
Template of what I would like to achive `
std::streampos seek_beg(std::streamoff off) {
if(off >= 0) {
// read/skip 'off' bytes
} else {
// reset (a)
// read/skip 'off' bytes
}
// return position
}
`
also my underflow() method is implemented that way: `
int underflow() {
int r = archive_read_data(ar, ptr, BUFFER_SIZE);
if (r < 0) {
throw std::runtime_error("ERROR");
} else if (r == 0) {
return std::streambuf::traits_type::eof();
} else {
setg(ptr, ptr, ptr + r);
}
return std::streambuf::traits_type::to_int_type(*ptr);
}
`
Libarchive documentation, more precisely, wishlist in libarchive wiki on GitHub says:
A few people have asked for the ability to efficiently "re-read" particular archive entries. This is a tricky subject. For many formats, the performance gains from this would be very modest. For example, with a little performance work, the seeking Zip reader could support very fast re-reading from the beginning since it only involves re-parsing the central directory. The cases where there would be real gains (e.g., tar.gz) are going to be very difficult to handle. The most likely implementation would be some form of checkpointing so that clients can explicitly ask for a checkpoint object and then restore back to that checkpoint. The checkpoint object could be complex if you have a series of stacked read filters plus state in the format handler itself.
As I see seeking in archives with help of libarchive is not right now possible, so a solution to my problem was to remember all read data only if I have some suspicion that I would want to re-read it, and alternatively push it back to stream.