Search code examples
c++serializationboost

Boost serialize for std::basic_string with custom allocator


I am trying to serialize the following string type with boost:

using CustomString = std::basic_string<char, std::char_traits<char>, CustomAllocator<char>>;

1. Since this type is identical to the definition of the std::string, except for the allocator, my first approach was to do the same as boost does for the std::string.

I added BOOST_CLASS_IMPLEMENTATION(CustomString, boost::serialization::primitive_type) after the using statement.

This seemed to work fine till I tried to serialize a string which had a space in it.

Serialization is done into a boost::archive::text_oarchive and the separator is set to space. So after deserialization only the first part of the string got read from the archive e.g. I wrote "Hello World" to the archive but only got "Hello" after deserialization. For std::string boost adds a length field before the text. This is not the case for the custom string.

The source for the length entry is boost\archive\impl\text_oarchive_impl.ipp

text_oarchive_impl<Archive>::save(const std::string &s)
{
    const std::size_t size = s.size();
    *this->This() << size;
    this->This()->newtoken();
    os << s;
}

I found the same problem in the answer to the following question: Can boost::container::strings be serialized using boost serialization?

I extended the example to make the problem visible: https://coliru.stacked-crooked.com/a/84b2cb162d58534a

2. Now I wrote my own serialization functions

namespace boost
{
namespace serialization
{

template<class Archive>
inline void serialize(Archive& ar, CustomString & s, const unsigned int file_version)
{
  boost::serialization::split_free(ar, s, file_version);
}

template<typename Archive>
inline void save(
  Archive& ar, const CustomString & s, const unsigned int /* file_version */
)
{
  ar << s.size();
  for (size_t i = 0; i < s.size(); i++)
  {
    ar << s.c_str()[i];
  }  
}

template<typename Archive>
inline void load(
  Archive& ar, CustomString & s, const unsigned int /* file_version */
)
{
  size_t size;
  ar >> size;
  char c;
  for (size_t i=0; i < size;i++)
  {
    ar >> c;
    s.push_back(c);
  }
}

} // namespace serialization
} // namespace boost

This code does work but produces a quite long archive, because chars get encoded as a decimal values:

std::string Archive: 22 serialization::archive 19 11 Hello World

CustomString Archive: 22 serialization::archive 19 0 0 11 72 101 108 108 111 32 87 111 114 108 100

I would be grateful for a hint on how to improve one of the two approaches. Thank you!


Solution

  • Yeah, I'm still surprised support for {std,boost::container}::basic_string is not out of the box. Also, pretty suprised nobody noticed the old answer was broken.

    I agree that custom serialization is the safer bet. It can be a little simpler and lightweigth:

    Live On Coliru

    #include <boost/algorithm/string.hpp>
    #include <boost/archive/text_iarchive.hpp>
    #include <boost/archive/text_oarchive.hpp>
    #include <boost/serialization/binary_object.hpp>
    #include <iomanip>
    #include <iostream>
    
    namespace Funky {
        template <typename T> struct FunkyAlloc : std::allocator<T> {
            using std::allocator<T>::allocator;
            using std::allocator<T>::operator=;
        };
    
        using String = std::basic_string<char, std::char_traits<char>, FunkyAlloc<char>>;
    
        template <typename Ar, typename TCh, typename TChT, typename Allocator>
        void serialize(Ar& ar, std::basic_string<TCh, TChT, Allocator>& s, unsigned) {
            size_t n = s.length();
            ar&    n;
    
            if (Ar::is_loading::value)
                s.resize(n);
    
            // ar& boost::serialization::make_array(s.data(), n);
            ar& boost::serialization::make_binary_object(s.data(), n);
        }
    }
    
    int main() {
        for (auto test_case : {"", " ", "   ", "ford mustang", "hallberg rassy"}) {
            std::stringstream ss;
    
            Funky::String fs = test_case;
            boost::archive::text_oarchive(ss) << fs;
    
            std::cout << "Archive reads: " << quoted(boost::replace_all_copy(ss.str(), "\n", "\\n")) << std::endl;
    
            Funky::String roundtrip;
            boost::archive::text_iarchive (ss)>>roundtrip;
            std::cout << "Roundtripped Funky::String: " << quoted(roundtrip) << "\n";
    
            assert(roundtrip == test_case);
        }
    }
    

    Passes the asserts and prints the expected:

    g++ -std=c++20 -O2 -Wall -pedantic -pthread main.cpp -lboost_serialization && ./a.out
    Archive reads: "22 serialization::archive 20 0 0 0\\n\\n"
    Roundtripped Funky::String: ""
    Archive reads: "22 serialization::archive 20 0 0 1\\n\\nIA==\\n"
    Roundtripped Funky::String: " "
    Archive reads: "22 serialization::archive 20 0 0 3\\n\\nICAg\\n"
    Roundtripped Funky::String: "   "
    Archive reads: "22 serialization::archive 20 0 0 12\\n\\nZm9yZCBtdXN0YW5n\\n"
    Roundtripped Funky::String: "ford mustang"
    Archive reads: "22 serialization::archive 20 0 0 14\\n\\naGFsbGJlcmcgcmFzc3k=\\n"
    Roundtripped Funky::String: "hallberg rassy"
    

    If you comment out the make_array line instead you get close to what you had manually coded: (Live)

    Archive reads: "22 serialization::archive 20 0 0 0\\n"
    Roundtripped Funky::String: ""
    Archive reads: "22 serialization::archive 20 0 0 1 32\\n"
    Roundtripped Funky::String: " "
    Archive reads: "22 serialization::archive 20 0 0 3 32 32 32\\n"
    Roundtripped Funky::String: "   "
    Archive reads: "22 serialization::archive 20 0 0 12 102 111 114 100 32 109 117 115 116 97 110 103\\n"
    Roundtripped Funky::String: "ford mustang"
    Archive reads: "22 serialization::archive 20 0 0 14 104 97 108 108 98 101 114 103 32 114 97 115 115 121\\n"
    Roundtripped Funky::String: "hallberg rassy"