I am trying to reduce the memory size of boost archives in C++.
One problem I have found is that Boost's binary archives default to using 4 bytes for any int, regardless of its magnitude. For this reason, I am getting that an empty boost binary archive takes 62 bytes while an empty text archive takes 40 (text representation of an empty text archive: 22 serialization::archive 14 0 0 1 0 0 0 0 0
).
Is there any way to change this default behavior for ints?
Else, are there any other ways to optimize the size of a binary archive apart from using make_array for vectors?
Q. I am trying to reduce the memory size of boost archives in C++.
Q. One problem I have found is that Boost's binary archives default to using 4 bytes for any int, regardless of its magnitude.
That's because it's a serialization library, not a compression library
Q. For this reason, I am getting that an empty boost binary archive takes 62 bytes while an empty text archive takes 40 (text representation of an empty text archive: 22 serialization::archive 14 0 0 1 0 0 0 0 0).
Use the archive flags: e.g. from Boost Serialization : How To Predict The Size Of The Serialized Result?:
- Tune things (boost::archive::no_codecvt, boost::archive::no_header, disable tracking etc.)
Q. Is there any way to change this default behavior for ints?
No. There is BOOST_IS_BITWISE_SERIALIZABLE(T)
though (see e.g. Boost serialization bitwise serializability for an example and explanations).
Q. Else, are there any other ways to optimize the size of a binary archive apart from using
make_array
for vectors?
Using make_array
doesn't help for vector<int>
:
#include <boost/archive/binary_oarchive.hpp>
#include <boost/serialization/vector.hpp>
#include <sstream>
#include <iostream>
static auto const flags = boost::archive::no_header | boost::archive::no_tracking;
template <typename T>
std::string direct(T const& v) {
std::ostringstream oss;
{
boost::archive::binary_oarchive oa(oss, flags);
oa << v;
}
return oss.str();
}
template <typename T>
std::string as_pod_array(T const& v) {
std::ostringstream oss;
{
boost::archive::binary_oarchive oa(oss, flags);
oa << v.size() << boost::serialization::make_array(v.data(), v.size());
}
return oss.str();
}
int main() {
std::vector<int> i(100);
std::cout << "direct: " << direct(i).size() << "\n";
std::cout << "as_pod_array: " << as_pod_array(i).size() << "\n";
}
Prints
direct: 408
as_pod_array: 408
The most straightforward way to optimize is to compress the resulting stream (see also the benchmarks added here).
Barring that, you will have to override default serialization and apply your own compression (which could be a simple run-length encoding, huffman coding or something more domain specific).
#include <boost/archive/binary_oarchive.hpp>
#include <boost/serialization/vector.hpp>
#include <sstream>
#include <iostream>
#include <boost/iostreams/filter/bzip2.hpp>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/copy.hpp>
static auto const flags = boost::archive::no_header | boost::archive::no_tracking;
template <typename T>
size_t archive_size(T const& v)
{
std::stringstream ss;
{
boost::archive::binary_oarchive oa(ss, flags);
oa << v;
}
std::vector<char> compressed;
{
boost::iostreams::filtering_ostream fos;
fos.push(boost::iostreams::bzip2_compressor());
fos.push(boost::iostreams::back_inserter(compressed));
boost::iostreams::copy(ss, fos);
}
return compressed.size();
}
int main() {
std::vector<int> i(100);
std::cout << "bzip2: " << archive_size(i) << "\n";
}
Prints
bzip2: 47
That's a compression ratio of ~11% (or ~19% if you drop the archive flags).