Search code examples
c++jsonasciijsoncpp

JsonCpp: Serializing JSON causes loss of data in byte string


I have a simple use case where I wish to serialize and transmit vectors of integers between 0 and 256. I surmised that the most space-efficient way of doing so would be to serialize the vector as a serialized string, where the nth character has the ASCII code equivalent to the nth element of the corresponding vector. To this end, I wrote the following two functions:

std::string SerializeToBytes(const std::vector<int> &frag)
{
    std::vector<unsigned char> res;
    res.reserve(frag.size());
    for(int val : frag) {
        res.push_back((char) val);
    }
    return std::string(res.begin(), res.end());
}

std::vector<int> ParseFromBytes(const std::string &serialized_frag)
{
    std::vector<int> res;
    res.reserve(serialized_frag.length());
    for(unsigned char c : serialized_frag) {
        res.push_back(c);
    }
    return res;
}

However, when sending this data using JsonCpp, I run into issues. The minimum reproducible example below indicates that the issue does not stem from the above methods and instead appears only when a Json::Value is serialized and subsequently parsed. This causes the loss of some encoded data in the serialized string.

#include <cassert>
#include <json/json.h>

int main() {
    std::vector frag = { 230 };
    std::string serialized = SerializeToBytes(frag);

    // Will pass, indicating that the SerializeToBytes and ParseFromBytes functions are not the issue.
    assert(frag == ParseFromBytes(serialized));

    Json::Value val;
    val["STR"] = serialized;

    // Will pass, showing that the issue does not appear until JSON is serialized and then parsed.
    assert(frag == ParseFromBytes(val["STR"].asString()));

    Json::StreamWriterBuilder builder;
    builder["indentation"] = "";
    std::string serialized_json = Json::writeString(builder, val);

    // Will be serialized to "{\"STR\":\"\\ufffd\"}".
    Json::Value reconstructed_json;
    Json::Reader reader;
    reader.parse(serialized_json, reconstructed_json);

    // Will produce { 239, 191, 189 }, rather than { 230 }, as it should.
    std::vector<int> frag_from_json = ParseFromBytes(reconstructed_json["STR"].asString());

    // Will fail, showing that the issue stems from the serialize/parsing process.
    assert(frag == frag_from_json);

    return 0;
}

What is the cause of this issue, and how can I remedy it? Thanks for any help you can offer.


Solution

  • Jsoncpp Class Value

    This class is a discriminated union wrapper that can represents a:

    • ...
    • UTF-8 string
    • ...

    { 230 } is invalid UTF-8 string. Thus further expectations from Json::writeString(builder, val) for a correct result are illegal.