Search code examples
c++boost-serialization

Boost Serialization – Avoid string length output?


I'm serializing some data coming off of two different branches of a project. The data is outputted in text archive form using boost::serialization. Since a diff is to be made between the two serialized files, I'm also adding some debug messages into the serialized files between each serialized part, as to better see the point at which a difference in the output may occur.

Here is a broad overview of the code I'm currently using:

std::ofstream ofs("./SomeFile.txt"); // for brevity's sake, no actual path used
{
  boost::archive::text_oarchive oa(ofs);
  std::string debug_str;

  debug_str = "\n\nPart 1\n";
  oa & debug_str;

  // ... some other serialized parts ...

  debug_str = "\n\nPart 145\n";
  oa << debug_str;
}

You can notice that I first thought that the used operator (& vs <<) made a difference in output, yet it doesn't, I get the following text file :

22 serialization::archive 7 9 [CRLF]
[CRLF]
Part 1 [CRLF]
 11 [CRLF]
[CRLF]
Part 145 [CRLF]

The 22 serialization::archive 7 part is a standard Boost serialization header, identifying the type of archive I guess. Afterwards comes the part I would like to have removed, namely 9 – after a bit of goose chasing I figured out that the 9 is the length of the "\n\nPart1\n" string!

Is this expected behaviour or is there a way of circumventing this output? In the case of other actual records, there is no apparent use of other such "control codes", marking length or such.

It would be useful to add some debug output, yet because the lengths of the said debug strings may differ (since heavy refactoring occured on one of the branches), the diff yields some false positives.

Any thoughts would be appreciated, thanks!


Solution

  • I doubt this is possible.

    The problem you face here is that the textual output need be parsed back to be deserialized. There are two main ways of structuring textual output so it can be parsed back easily:

    • length-prefixed strings
    • special characters (with escape codes)

    For example, in an xml archive the tags are surrounded with < and > and you cannot use those characters yourself, instead using escaped codes &lt; and &gt; respectively. On the other hand, if you look at the Redis format you will see things like 13$Hello, World! where the length of the record is a string of digits, followed by $, followed by the actual record.

    The former way (length-prefixed strings) is much more efficient, but much less human writable.

    From the Boost::Serialization documentation, I see two different "human-readable" archives:

    • boost::text_[i/o]archive uses the length-prefixed string (it seems)
    • boost::xml_[i/o]archive uses the xml representation

    You might want to switch to boost::xml_oarchive.