Search code examples
c++serializationboostpolymorphism

C++ boost serialize polymorphism question


I use a pointer of base class to serialize an object of derived class. The code below seems works. But I am fuzzy about the executed order. The output info is:
CC serialize start
BB serialize start
AA serialize start
AA serialize end
BB serialize end
CC serialize end
CC serialize start
BB serialize start
AA serialize start
AA serialize end
BB serialize end
CC serialize end
class_aa
class_bb
class_cc
since the serialize function is not virtual, I am wondering why the serialize fuction in class CC will execute at first, and then BB, AA's serialize function.

Thanks

#include <boost/serialization/serialization.hpp>
#include <boost/serialization/nvp.hpp>
#include <boost/archive/text_oarchive.hpp> 
#include <boost/archive/text_iarchive.hpp>
#include <iostream>
#include <string>
#include <boost/serialization/export.hpp>
#include <fstream>
using namespace std;

class AA
{
public:
    virtual void foo() = 0;
    AA(string aa) :aa_name(aa) { }
    AA() {}

    template<class Archive>
    void serialize(Archive& ar, unsigned int)
    {
        cout << "AA serialize start" << endl;
        ar& aa_name;
        cout << "AA serialize end" << endl;
    }

    string aa_name;
};
BOOST_SERIALIZATION_ASSUME_ABSTRACT(AA);

class BB : public AA
{
public:
    void foo() {}
    virtual void bar() = 0;
    BB(string aa, string bb) : AA(aa), bb_name(bb) {}
    BB() {}

    template<class Archive>
    void serialize(Archive& ar, unsigned int)
    {
        cout << "BB serialize start" << endl;
        ar& boost::serialization::base_object<AA>(*this);
        ar& bb_name;
        cout << "BB serialize end" << endl;
    }
    string bb_name;
};
BOOST_SERIALIZATION_ASSUME_ABSTRACT(BB);

class CC : public BB
{
public:
    CC(string aa, string bb, string cc) : BB(aa, bb), cc_name(cc) {}
    CC() {}
    void bar() {} 

    template<class Archive>
    void serialize(Archive& ar, unsigned int)
    {
        cout << "CC serialize start" << endl;
        ar& boost::serialization::base_object<BB>(*this);
        ar& cc_name;
        cout << "CC serialize end" << endl;
    }

    string cc_name;
};
BOOST_CLASS_EXPORT(CC)


int main(int, char const**)
{
    AA* obj = new CC("class_aa", "class_bb", "class_cc");
    ofstream outfile("archive_test.txt");
    boost::archive::text_oarchive out_archive(outfile);
    out_archive << obj;
    outfile.close();

    ifstream infile("archive_test.txt");
    boost::archive::text_iarchive ia(infile);
    AA* c;
    ia >> c;
    cout << c->aa_name << endl << dynamic_cast<CC*>(c)->bb_name << endl << dynamic_cast<CC*>(c)->cc_name << endl;
    infile.close();
}

Solution

  • Q. since the serialize function is not virtual, I am wondering why the serialize fuction in class CC will execute at first, and then BB, AA's serialize function.

    The reason is that when serializing through polymorphic references/pointers¹ the archive will contain the type id of the serialized type, so the library knows what type to deserialize.

    You can actually see this in the archive:

    22 serialization::archive 18 0 0 1 7 Derived 1 0 0 0 0 0 0 8 class_aa 8 class_bb 8 class_cc

    You have some control over how the type is exported using the alternative export macros.

    Note that in the above example, the hierarchy lacks a virtual destructor and there's memory leaks. The latter may be obvious, but the former might not².

    Here's with

    Live On Coliru

    #include <boost/archive/text_iarchive.hpp>
    #include <boost/archive/text_oarchive.hpp>
    #include <boost/serialization/export.hpp>
    #include <boost/serialization/unique_ptr.hpp>
    #include <boost/serialization/serialization.hpp>
    #include <fstream>
    #include <iostream>
    #include <string>
    #include <utility>
    
    class Base {
      public:
        Base(std::string aa) : aa_name(std::move(aa)) {}
        Base() = default;
    
        virtual void foo() const = 0;
        virtual ~Base() = default;
    
      private:
        friend class boost::serialization::access;
        template <class Archive> void serialize(Archive& ar, unsigned int) {
            std::cout << "Base serialize start" << std::endl;
            ar& aa_name;
            std::cout << "Base serialize end" << std::endl;
        }
    
        std::string aa_name;
      protected:
        void qux() const { std::cout << aa_name << std::endl; }
    };
    
    class Middle : public Base {
      public:
        Middle(std::string aa, std::string Middle) : Base(aa), bb_name(std::move(Middle)) {}
        Middle() = default;
    
        void foo() const override { qux(); std::cout << bb_name << std::endl; }
        virtual void bar() const = 0;
    
      private:
        friend class boost::serialization::access;
        template <class Archive> void serialize(Archive& ar, unsigned int) {
            std::cout << "Middle serialize start" << std::endl;
            ar& boost::serialization::base_object<Base>(*this);
            ar& bb_name;
            std::cout << "Middle serialize end" << std::endl;
        }
        std::string bb_name;
    };
    
    class Derived : public Middle {
      public:
        Derived(std::string aa, std::string bb, std::string cc) : Middle(aa, bb), cc_name(std::move(cc)) {}
        Derived() = default;
    
        void bar() const override { foo(); std::cout << cc_name << std::endl; }
    
     private:
        friend class boost::serialization::access;
        template <class Archive> void serialize(Archive& ar, unsigned int) {
            std::cout << "Derived serialize start" << std::endl;
            ar& boost::serialization::base_object<Middle>(*this);
            ar& cc_name;
            std::cout << "Derived serialize end" << std::endl;
        }
    
        std::string cc_name;
    };
    
    BOOST_SERIALIZATION_ASSUME_ABSTRACT(Base)
    BOOST_SERIALIZATION_ASSUME_ABSTRACT(Middle)
    BOOST_CLASS_EXPORT(Derived)
    
    int main() {
        using Ptr = std::unique_ptr<Base>;
    
        {
            std::ofstream outfile("archive_test.txt");
            boost::archive::text_oarchive out_archive(outfile);
    
            Ptr obj = std::make_unique<Derived>("class_aa", "class_bb", "class_cc");
            out_archive << obj;
        }
    
        {
            std::ifstream infile("archive_test.txt");
            boost::archive::text_iarchive ia(infile);
            Ptr obj;
            ia >> obj;
    
            if (auto cp = dynamic_cast<Derived*>(obj.get()))
                cp->bar();
        }
    }
    

    Prints

    Derived serialize start
    Middle serialize start
    Base serialize start
    Base serialize end
    Middle serialize end
    Derived serialize end
    Derived serialize start
    Middle serialize start
    Base serialize start
    Base serialize end
    Middle serialize end
    Derived serialize end
    class_aa
    class_bb
    class_cc
    

    ¹ references lead to an entirely different subject: object tracking

    ² When to use virtual destructors?

    BONUS

    To the comments, some technical analysis:

    The macro BOOST_CLASS_EXPORT(Derived) expands to

    namespace boost { namespace serialization {
        template <> struct guid_defined<Derived> : boost::mpl::true_ {};
        template <> inline const char* guid<Derived>() { return "Derived"; }
    } }
    
    namespace boost { namespace archive { namespace detail { namespace extra_detail {
        template <> struct init_guid<Derived> {
            static guid_initializer<Derived> const& g;
        };
        guid_initializer<Derived> const& init_guid<Derived>::g =
            ::boost::serialization::singleton<
            guid_initializer<Derived>>::get_mutable_instance()
            .export_guid();
    } } } }
    

    export_guid() reads:

    guid_initializer const & export_guid() const {
        BOOST_STATIC_WARNING(boost::is_polymorphic< T >::value);
        // note: exporting an abstract base class will have no effect
        // and cannot be used to instantitiate serialization code
        // (one might be using this in a DLL to instantiate code)
        //BOOST_STATIC_WARNING(! boost::serialization::is_abstract< T >::value);
        export_guid(boost::serialization::is_abstract< T >());
        return *this;
    }
    
    void export_guid(mpl::false_) const {
        // generates the statically-initialized objects whose constructors
        // register the information allowing serialization of T objects
        // through pointers to their base classes.
        instantiate_ptr_serialization((T*)0, 0, adl_tag());
    }
    

    Note how it clearly documents that it only makes sense for polymorphic classes. instantiate_ptr_serialization actually registers the types with all known archives.

    There's a ton of template machinery to get everything to instantiate (and only once) regardless of your code organization (dynamic/shared linkage, separate translation unit or not). But in the end it goes to register_type:

    template<class T>
    const basic_pointer_iserializer *
    register_type(T * = NULL){
        const basic_pointer_iserializer & bpis =
            boost::serialization::singleton<
                pointer_iserializer<Archive, T>
            >::get_const_instance();
        this->This()->register_basic_serializer(bpis.get_basic_serializer());
        return & bpis;
    }
    

    Where it registers the pointer_iserializer:

    template<class Archive, class T>
    class pointer_iserializer :
        public basic_pointer_iserializer
    {
    private:
        virtual void * heap_allocation() const {
            detail::heap_allocation<T> h;
            T * t = h.get();
            h.release();
            return t;
        }
        virtual const basic_iserializer & get_basic_serializer() const {
            return boost::serialization::singleton<
                iserializer<Archive, T>
            >::get_const_instance();
        }
        BOOST_DLLEXPORT virtual void load_object_ptr(
            basic_iarchive & ar,
            void * x,
            const unsigned int file_version
        ) const BOOST_USED;
    public:
        // this should alway be a singleton so make the constructor protected
        pointer_iserializer();
        ~pointer_iserializer();
    };
    

    This wraps the actual new/delete in a heap_allocation. For brevity I don't include that here, because I have already analyzed that in more detail here: How does boost::serialization allocate memory when deserializing through a pointer?