Search code examples
c++rapidjson

Rapidjson regex validation giving unexpected result


I'm using the same example from examples/schemavalidator.cpp.

My schema:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "title": "SampleMessage",
  "properties": {
    "id": {
      "type": "string"
    },
    "regexfield": {
      "type": "string",
      "pattern": "^[\\x00-\\x7F]*$"
    },
    "status": {
      "type": "string",
      "enum": [
        "VALID",
        "INVALID"
      ]
    }
  },
  "required": [
    "regexfield",
    "status"
  ]
}

My message;

{
  "id": "w10ooe",
  "regexfield": "w10ooe¼",
  "status": "VALID"
}

Expected result: validation fail for pattern

Actual result Input JSON is valid.

As you can see, the pattern is for validating if the regexfield contains only ASCII characters. However, the validation returns successful even when the field has a non-ascii value. The same regex gives expected result when checking with regex101.com. can you please explain how to address this issue and a workaround?


Solution

  • The default regex engine does not support character escapes of the form \xXX (note the lack of a x option in this switch).

    You can switch to std::regex (which does support this) by setting both of the following preprocessor symbols:

    RAPIDJSON_SCHEMA_USE_INTERNALREGEX=0
    RAPIDJSON_SCHEMA_USE_STDREGEX=1
    

    You can either #define these macros at the top of your file or set them as a preprocessor flag in your build system (-D...)