Search code examples
c++cjsonrapidjson

How to prevent JSON parser crashing when there are illigal characters in JSON?


Due to some communication errors, I am sometimes receiving JSON strings with some illegal characters: "{messageType\" : \"Test1\", \"from\" : \"F2D0B5C6-9875-46B5-8D4F\"}����1"

These illegal characters are making my JSON parser to break. I am using RapidJSON JSON parser (C/ C++). Can you please tell me if there is a way I can filter these unwanted characters from the string and also verify integrity of the json string.


Solution

  • It is not a bug in the parser. The parser verifies the trailing characters before null terminator are white spaces. And it returns error code when error happens. But if there is no null terminator, it may cause segmentation fault, similar to strlen().

    In the newer versions of RapidJSON, there is a kParseStopWhenDoneFlag. When it is enabled, the parser will stop reading trailing characters after a complete JSON value. E.g.

    Document d;
    const char* s =
        "{messageType\" : \"Test1\", \"from\" : \"F2D0B5C6-9875-46B5-8D4F\"}����1";
    d.Parse<kParseStopWhenDoneFlag>(s);
    assert(!d.HasParseError());
    

    By using this flag, the parser will stop after reading }, without reporting error.

    It is not yet documented in the guide. Please refer to discussion in https://github.com/miloyip/rapidjson/pull/83