Search code examples
c++jsonvalidationjsoncpp

Can I use JsonCpp to partially-validate JSON input?


I'm using JsonCpp to parse JSON in C++.

e.g.

Json::Reader r;
std::stringstream ss;
ss << "{\"name\": \"sample\"}";

Json::Value v;
assert(r.parse(ss, v));         // OK
assert(v["name"] == "sample");  // OK

But my actual input is a whole stream of JSON messages, that may arrive in chunks of any size; all I can do is to get JsonCpp to try to parse my input, character by character, eating up full JSON messages as we discover them:

Json::Reader r;
std::string input = "{\"name\": \"sample\"}{\"name\": \"aardvark\"}";

for (size_t cursor = 0; cursor < input.size(); cursor++) {  
    std::stringstream ss;
    ss << input.substr(0, cursor);

    Json::Value v;
    if (r.parse(ss, v)) {
        std::cout << v["name"] << " ";
        input.erase(0, cursor);
    }
} // Output: sample aardvark

This is already a bit nasty, but it does get worse. I also need to be able to resync when part of an input is missing (for any reason).

Now it doesn't have to be lossless, but I want to prevent an input such as the following from potentially breaking the parser forever:

{"name": "samp{"name": "aardvark"}

Passing this input to JsonCpp will fail, but that problem won't go away as we receive more characters into the buffer; that second name is simply invalid directly after the " that precedes it; the buffer can never be completed to present valid JSON.

However, if I could be told that the fragment certainly becomes invalid as of the second n character, I could drop everything in the buffer up to that point, and then simply wait for the next { to consider the start of a new object, as a best-effort resync.


So, is there a way that I can ask JsonCpp to tell me whether an incomplete fragment of JSON has already guaranteed that the complete "object" will be syntactically invalid?

That is:

{"name": "sample"}   Valid        (Json::Reader::parse == true)
{"name": "sam        Incomplete   (Json::Reader::parse == false)
{"name": "sam"LOL    Invalid      (Json::Reader::parse == false)

I'd like to distinguish between the two fail states.

Can I use JsonCpp to achieve this, or am I going to have to write my own JSON "partial validator" by constructing a state machine that considers which characters are "valid" at each step through the input string? I'd rather not re-invent the wheel...


Solution

  • It certainly depends if you actually control the packets (and thus the producer), or not. If you do, the most simple way is to indicate the boundaries in a header:

    +---+---+---+---+-----------------------
    | 3 | 16|132|243|endofprevious"}{"name":...
    +---+---+---+---+-----------------------
    

    The header is simple:

    • 3 indicates the number of boundaries
    • 16, 132 and 243 indicate the position of each boundary, which correspond to the opening bracket of a new object (or list)

    and then comes the buffer itself.

    Upon receiving such a packet, the following entries can be parsed:

    • previous + current[0:16]
    • current[16:132]
    • current[132:243]

    And current[243:] is saved for the next packet (though you can always attempt to parse it in case it's complete).

    This way, the packets are auto-synchronizing, and there is no fuzzy detection, with all the failure cases it entails.

    Note that there could be 0 boundaries in the packet. It simply implies that one object is big enough to span several packets, and you just need to accumulate for the moment.

    I would recommend making the numbers representation "fixed" (for example, 4 bytes each) and settling on a byte order (that of your machine) to convert them into/from binary easily. I believe the overhead to be fairly minimal (4 bytes + 4 bytes per entry given that {"name":""} is already 11 bytes).