Search code examples
c++serializationboostboost-serialization

Common confusions with serializing polymorphic types


I have seen many questions, tutorials, and documentation involving serializing derived classes, and I haven't been able to reach a consensus on several issues, including (and illustrated in the following code):

  • boost::serialization::base_object vs BOOST_SERIALIZATION_BASE_OBJECT_NVP
  • archive & mData; vs archive & BOOST_SERIALIZATION_NVP(mData);
  • The usefulness of BOOST_SERIALIZATION_ASSUME_ABSTRACT(AbstractPoint);
  • Requiring serialize() for a class in the hierarchy that doesn't need to serialize anything.

Code:

#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/shared_ptr.hpp>
#include <boost/serialization/base_object.hpp>

#include <fstream>

class AbstractPoint
{
public:
    virtual ~AbstractPoint(){}
    virtual void DoSomething() = 0;

    // Even though the class is abstract, we still need this
    template<class TArchive>
    void serialize(TArchive& archive, const unsigned int version)
    {
        // do nothing
    }
};

// This doesn't seem to do anything
//BOOST_SERIALIZATION_ASSUME_ABSTRACT(AbstractPoint);

class Point : public AbstractPoint
{
public:
    Point() = default;
    Point(const double data) : mData(data) {}

    void DoSomething(){}

    template<class TArchive>
    void serialize(TArchive& archive, const unsigned int version)
    {
        // These two seem equivalent. Without one of them, unregistered void cast
        archive & boost::serialization::base_object<AbstractPoint>(*this);
        //archive & BOOST_SERIALIZATION_BASE_OBJECT_NVP(AbstractPoint);

        // These two seem equivalent
        archive & mData;
        //archive & BOOST_SERIALIZATION_NVP(mData);
    }

    double mData;
};

int main()
{
    std::shared_ptr<AbstractPoint> point(new Point(7.4));

    std::ofstream outputStream("test.txt");
    boost::archive::text_oarchive outputArchive(outputStream);
    outputArchive.register_type<Point>();
    outputArchive << point;
    outputStream.close();

    std::shared_ptr<AbstractPoint> pointRead;
    std::ifstream inputStream("test.txt");
    boost::archive::text_iarchive inputArchive(inputStream);
    inputArchive.register_type<Point>();
    inputArchive >> pointRead;

    std::shared_ptr<Point> castedPoint = std::dynamic_pointer_cast<Point>(pointRead);
    std::cout << castedPoint->mData << std::endl;
    return 0;
}

The other major issue is where to register classes in a "real" environment (when there is linking, etc.), but that seems worth a separate question.

It would be great to have a "gold standard" example of these kinds of things in the documentation, but at the least on StackOverflow :)


Solution

    • boost::serialization::base_object vs BOOST_SERIALIZATION_BASE_OBJECT_NVP

    The NVP wrapper is only ever required for archives that have element naming, like XML.

    Unless you use it, base_object<> is cleaner and simpler.

    • archive & mData; vs archive & BOOST_SERIALIZATION_NVP(mData);

    Ditto

    • The usefulness of BOOST_SERIALIZATION_ASSUME_ABSTRACT(AbstractPoint);

    I assume it will merely be an optimization - suppressing registered type information with each archive type, since you told the framework it will never be de-serializing instances of the type

    • Requiring serialize() for a class in the hierarchy that doesn't need to serialize anything.

    You don't need it, unless you need the type information about a polymorphic base there. When do you need that? When you need to de-serialize pointers of the base type.

    Hence, if you have

    struct A{ virtual ~A(); };
    struct B:A{};
    
    struct C:B{};
    struct D:B{};` 
    

    you will need serialization for A (but not B) if you (de)serialize A*. You will need serialization for B if you (de)serialize B*.

    Similarly, if your type is not polymorphic (virtual) or you don't use it as such, you don't need any base serialization (e.g. if you (de)serialize C or D directly).

    Finally, if you have struct A{}; struct B:A{}; there is no need to tell Boost Serialization about the base type at all, (you could just do the serialization from within B).

    Update in response to your samples:

    1. case1.cpp looks ok
    2. case2.cpp needs to call base serialization, of course; not necessarily using base_object because you require polymorphic serialization:

      template<class TArchive> void serialize(TArchive& archive, unsigned) {
          archive & boost::serialization::base_object<AbstractPoint>(*this)
                  & mData;
          // OR:
          archive & static_cast<AbstractPoint&>(*this) 
                  & mData;
          // OR even just:
          archive & mParentData 
                  & mData;
      }
      
    3. case3.cpp: indeed, it's exactly like case1, but with dynamic allocation and object tracking

    4. case4.cpp: is exactly like case1, but with dynamic allocation and object tracking; NB!! it requires explicitly serializing for the base!

      template<class TArchive> void serialize(TArchive& archive, unsigned) {
          archive & boost::serialization::base_object<AbstractPoint>(*this)
                  & mData;
      }
      
    5. case5.cpp: yes, but it's more typical to use the CLASS_EXPORT* macros from boost/serialization/export.hpp

    Bitrot insurance: