c++serialization boost boost-serialization

Common confusions with serializing polymorphic types

I have seen many questions, tutorials, and documentation involving serializing derived classes, and I haven't been able to reach a consensus on several issues, including (and illustrated in the following code):

boost::serialization::base_object vs BOOST_SERIALIZATION_BASE_OBJECT_NVP
archive & mData; vs archive & BOOST_SERIALIZATION_NVP(mData);
The usefulness of BOOST_SERIALIZATION_ASSUME_ABSTRACT(AbstractPoint);
Requiring serialize() for a class in the hierarchy that doesn't need to serialize anything.

Code:

#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/shared_ptr.hpp>
#include <boost/serialization/base_object.hpp>

#include <fstream>

class AbstractPoint
{
public:
    virtual ~AbstractPoint(){}
    virtual void DoSomething() = 0;

    // Even though the class is abstract, we still need this
    template<class TArchive>
    void serialize(TArchive& archive, const unsigned int version)
    {
        // do nothing
    }
};

// This doesn't seem to do anything
//BOOST_SERIALIZATION_ASSUME_ABSTRACT(AbstractPoint);

class Point : public AbstractPoint
{
public:
    Point() = default;
    Point(const double data) : mData(data) {}

    void DoSomething(){}

    template<class TArchive>
    void serialize(TArchive& archive, const unsigned int version)
    {
        // These two seem equivalent. Without one of them, unregistered void cast
        archive & boost::serialization::base_object<AbstractPoint>(*this);
        //archive & BOOST_SERIALIZATION_BASE_OBJECT_NVP(AbstractPoint);

        // These two seem equivalent
        archive & mData;
        //archive & BOOST_SERIALIZATION_NVP(mData);
    }

    double mData;
};

int main()
{
    std::shared_ptr<AbstractPoint> point(new Point(7.4));

    std::ofstream outputStream("test.txt");
    boost::archive::text_oarchive outputArchive(outputStream);
    outputArchive.register_type<Point>();
    outputArchive << point;
    outputStream.close();

    std::shared_ptr<AbstractPoint> pointRead;
    std::ifstream inputStream("test.txt");
    boost::archive::text_iarchive inputArchive(inputStream);
    inputArchive.register_type<Point>();
    inputArchive >> pointRead;

    std::shared_ptr<Point> castedPoint = std::dynamic_pointer_cast<Point>(pointRead);
    std::cout << castedPoint->mData << std::endl;
    return 0;
}

The other major issue is where to register classes in a "real" environment (when there is linking, etc.), but that seems worth a separate question.

It would be great to have a "gold standard" example of these kinds of things in the documentation, but at the least on StackOverflow :)

Solution

boost::serialization::base_object vs BOOST_SERIALIZATION_BASE_OBJECT_NVP

The NVP wrapper is only ever required for archives that have element naming, like XML.

Unless you use it, base_object<> is cleaner and simpler.

archive & mData; vs archive & BOOST_SERIALIZATION_NVP(mData);

Ditto

The usefulness of BOOST_SERIALIZATION_ASSUME_ABSTRACT(AbstractPoint);

I assume it will merely be an optimization - suppressing registered type information with each archive type, since you told the framework it will never be de-serializing instances of the type

Requiring serialize() for a class in the hierarchy that doesn't need to serialize anything.

You don't need it, unless you need the type information about a polymorphic base there. When do you need that? When you need to de-serialize pointers of the base type.

Hence, if you have

struct A{ virtual ~A(); };
struct B:A{};

struct C:B{};
struct D:B{};`

you will need serialization for A (but not B) if you (de)serialize A*. You will need serialization for B if you (de)serialize B*.

Similarly, if your type is not polymorphic (virtual) or you don't use it as such, you don't need any base serialization (e.g. if you (de)serialize C or D directly).

Finally, if you have struct A{}; struct B:A{}; there is no need to tell Boost Serialization about the base type at all, (you could just do the serialization from within B).

Update in response to your samples:

case1.cpp looks ok

case2.cpp needs to call base serialization, of course; not necessarily using base_object because you require polymorphic serialization:

template<class TArchive> void serialize(TArchive& archive, unsigned) {
    archive & boost::serialization::base_object<AbstractPoint>(*this)
            & mData;
    // OR:
    archive & static_cast<AbstractPoint&>(*this) 
            & mData;
    // OR even just:
    archive & mParentData 
            & mData;
}

case3.cpp: indeed, it's exactly like case1, but with dynamic allocation and object tracking

case4.cpp: is exactly like case1, but with dynamic allocation and object tracking; NB!! it requires explicitly serializing for the base!

template<class TArchive> void serialize(TArchive& archive, unsigned) {
    archive & boost::serialization::base_object<AbstractPoint>(*this)
            & mData;
}

case5.cpp: yes, but it's more typical to use the CLASS_EXPORT* macros from boost/serialization/export.hpp

Bitrot insurance:

case1.cpp

case2.cpp

case3.cpp

case4.cpp

case5.cpp