In the below code, I have a corrupt "hello.bz2" which has stray characters beyond the EOF.
Is there a way to make the boost::iostreams::copy() call to throw ?
#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>
int main()
{
using namespace std;
using namespace boost::iostreams;
ifstream file("hello.bz2", ios_base::in | ios_base::binary);
filtering_streambuf<input> in;
in.push(bzip2_decompressor());
in.push(file);
boost::iostreams::copy(in, cout);
}
EDIT: Please ignore the line that is so far attracted most attention; the EOF. Please assume working with a corrupted bzip2 file. I used "EOF" suggesting the error I got when I run bzcat on the file
bzcat hello.bz2
hello world
bzcat: hello.bz2: trailing garbage after EOF ignored
std::ios_base::failure is the "the base class for the types of all objects thrown as exceptions, by functions in the Iostreams library, to report errors detected during stream buffer operations."
Looking at the boost docs:
class bzip2_error : public std::ios_base::failure {
public:
bzip2_error(int error);
int error() const;
};
bzip2_error is a specific exception thrown when using the bzip2 filter, which inherits from std::ios_base::failure. As you can see, it is constructed by passing in an integer representing the error code. It also has a method error() which returns the error code it was constructed with.
The docs list bzip2 error codes as the following:
EDIT I also want to clarify that boost::iostreams::copy() will not be the one throwing the exception here, but the bzip2 filter. Only the iostream or filters will throw exceptions, copy just uses the iostream/filter which may cause the iostream/filter to throw an exception.
**EDIT 2 ** It appears the problem is with bzip2_decompressor_impl as you have expected. I have replicated the endless spinning loop when the bz2 file is empty. It took me a little while to figure out how to build boost and link with bzip2, zlib, and iostreams library to see if I could replicate your results.
g++ test.cpp -lz -lbz2 boostinstall/boost/bin.v2/libs/iostreams/build/darwin-4.2.1/release/link-static/threading-multi/libboost_iostreams.a -Lboostinstall/boost/bin.v2/libs/ -Iboost/include/boost-1_42 -g
test.cpp:
#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>
int main()
{
using namespace std;
using namespace boost::iostreams;
try {
ifstream file("hello.bz2", ios_base::in | ios_base::binary);
filtering_streambuf<input> in;
in.push(bzip2_decompressor());
in.push(file);
boost::iostreams::copy(in, cout);
}
catch(const bzip2_error& exception) {
int error = exception.error();
if(error == boost::iostreams::bzip2::data_error) {
// compressed data stream is corrupted
cout << "compressed data stream is corrupted";
}
else if(error == boost::iostreams::bzip2::data_error_magic)
{
// compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'
cout << "compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'";
}
else if(boost::iostreams::bzip2::config_error) {
// libbzip2 has been improperly configured for the current platform
cout << "libbzip2 has been improperly configured for the current platform";
}
}
}
debugging:
gdb a.out
(gdb) b bzip2.hpp:344
There is a loop that drives the bzip2's uncompression in symmetric.hpp:109 :
while (true)
{
// Invoke filter if there are unconsumed characters in buffer or if
// filter must be flushed.
bool flush = status == f_eof;
if (buf.ptr() != buf.eptr() || flush) {
const char_type* next = buf.ptr();
bool done =
!filter().filter(next, buf.eptr(), next_s, end_s, flush);
buf.ptr() = buf.data() + (next - buf.data());
if (done)
return detail::check_eof(
static_cast<std::streamsize>(next_s - s)
);
}
// If no more characters are available without blocking, or
// if read request has been satisfied, return.
if ( (status == f_would_block && buf.ptr() == buf.eptr()) ||
next_s == end_s )
{
return static_cast<std::streamsize>(next_s - s);
}
// Fill buffer.
if (status == f_good)
status = fill(src);
}
bzip2_decompressor_impl's filter method bzip2.hpp:344 gets called on symmetric.hpp:117 :
template<typename Alloc>
bool bzip2_decompressor_impl<Alloc>::filter
( const char*& src_begin, const char* src_end,
char*& dest_begin, char* dest_end, bool /* flush */ )
{
if (!ready())
init();
if (eof_)
return false;
before(src_begin, src_end, dest_begin, dest_end);
int result = decompress();
after(src_begin, dest_begin);
bzip2_error::check BOOST_PREVENT_MACRO_SUBSTITUTION(result);
return !(eof_ = result == bzip2::stream_end);
}
I think the problem is simple, the bzip2_decompressor_impl's eof_ flag never gets set. Unless it's suppose to happen in some magic way I don't understand, it's owned by the bzip2_decompressor_impl class and it's only ever being set to false. So when we do this:
cat /dev/null > hello.bz2
We get a spinning loop that never ends, we don't break when an EOF is hit. This is certainly a bug, because other programs (like vim) would have no problem opening a text file created in a similar manner. However I am able to get the filter to throw when the bz2 file is "corrupted":
echo "other corrupt" > hello.bz2
./a.out
compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'
Sometimes you have to take open source code with a grain of salt. It will be more likely that your bz2's will be corrupted and properly throw. However, the /dev/null case is a serious bug. We should submit it to the boost dev so they can fix it.