c++performance boost boost-serialization

Is it possible to reuse a binary_oarchive instance?

My question is the same as discussed in this thread from five years ago (which has no good answer).

I'm serializing my objects into a byte buffer, like so:

std::string serial_str;
for (i = 1; i < 10000; i++)
{
    boost::iostreams::back_insert_device<std::string> inserter(serial_str);
    boost::iostreams::stream<boost::iostreams::back_insert_device<std::string> > s(inserter);
    boost::archive::binary_oarchive oa(s);

    oa << obj;

    s.flush();

    // code to send serial_str's content to another process, omitted.

    serial_str.clear(); // clear the buffer so it can be reused to serialize the next object
}

When I do this in a loop, the performance is quite bad: I get ~14,000 objects / sec.

I've pinpointed the problem down to the recreation of the binary_oarchive. If I just write into the same string with the same archive instance in a loop, I get ~220,000 objects/sec, but then, the objects are serialized one after the other sequentially, which isn't what I want: I want to clear and reuse the same buffer (seek to its beginning) after each object is serialized.

How can I do that?

Solution

Yes, you absolutely can reuse it, in a sense. The oarchive simply wraps up a stream and doesn't know what's going on with the stream's data, so the trick is to implement your own stream (which isn't fun) to allow you to "reset" the actual underlaying data stream. I've written something like this before and it works wonderfully.

Some gotchas to be aware of though:

The oarchive won't keep writing out header information (since if it persists it's treating everything as one big stream), so you'll want to disable the headers:

boost::archive::binary_oarchive oa(s, boost::archive::no_codecvt | boost::archive::no_header);

Also, because you're reusing an oarchive, you have to be extremely careful about managing its internal type table. If all you're serializing are ints, floats, etc, then you'll be fine, but as soon as you start serializing classes, strings, and the like you can't rely on the default type enumeration that the archive uses when reusing the archive like this. The Boost documentation doesn't really get into this, but for anything complex, you need to do the following for every type the archive will come across:

oa.template register_type<std::string>();
oa.template register_type<MyClass>();
oa.template register_type<std::shared_ptr<MyClass> >();

And so on.. for all your types, all std::vectors of them, all std::shared_ptrs of them, etc. This is vital. Otherwise you'll only be able to read back your streams if you use a shared iarchive and read them in the exact same order they were serialized out.

The consequence is that your iarchive needs to register all the types in the exact same way and order as their oarchive (I wrote some handy helpers using mpl to do help me with this).

Serializing back in through an iarchive can also share the same iarchive, however all the same conditions apply:

You need to write your own stream (so it can be redirected/reset)
Disable the archive headers
Have the register types

So yes, reusing an oarchive/iarchive is possible, but it's a bit of a pain. Once you've got it sorted out though, it's pretty awesome.