Search code examples
c++serializationprotocol-buffersseparator

Storing a set of serialized protobuf objects


I'm using SerializeToString on my Protobuf objects, and then store the string in a DB.

However, sometimes I have an array of such objects. I would like to store the whole serialized array, and for that I need some separator string between the serialized strings.

According to the documentation I've seen, the string is just a byte array, and so I got nothing promised to me regarding its content.

What would be the best approach here?

I don't know the length of the array because objects may be appended to it as we go, and I want it stored in DB throughout the entire process.


Solution

  • Suppose, your protobuf message looks like this:

    message Object
    {
      ... = 1;
      ... = 2;
      ... = 3;
    }
    

    Then in the same file introduce 1 more message which is a collection of these Objects.

    message Objects
    {
      repeated Object array = 1;
    }
    

    Hence, when you have many elements, you may simply use the Objects and use SerializeAsString() on Objects itself. That will save your efforts of serializing individual Object separately and putting your own handmade delimiter. You can serialize all the Objects using single instance of Objects.
    With this approach you are delegating all parsing & serializing work also to Protobuf library. I use this in my project and it works like a charm.

    Additionally, judicious use of Objects will also avoid making extra copies of Object. You can add items to that and access using indexing. The repeated fields of protobufs are C++11 compliant, hence you may use it with iterators or enhanced for loop as well.


    Important to note that, when you are storing the output of the Objects::SerializeAsString() into a file, you should first input the length of that string followed by the actual serialized string. While reading, you can read the length first followed by the total bytes. For ease of use, I have extended the std::fstream and overloaded the below methods:

    struct fstreamEncoded : std::fstream
    {
        // other methods
        void  // returns `void` to avoid multiple `<<` in a single line
        operator<< (const string& content)
        {  // below casting would avoid recursive calling of this method
           // adding `length() + 1` to include the protobuf's last character as well
          static_cast<std::fstream&>(*this) << (content.length() + 1) << "\n" << content << std::endl;
        }
    
        string
        getline ()
        {
          char length_[20] = {};
          std::istream::getline(length_, sizeof(length_) - 1);
          if(*length_ == 0)
            return "";
    
          const size_t length = std::atol(length_);  // length of encoded input
          string content(length, 0);  // resize the `string`
          read(&content[0], length);  // member of `class istream`
          return content;
        }
    }
    

    Above is the just an illustration. You may follow according to your project needs.