Search code examples
jsonregexunicodejsonschemaletter

Can I use the unicode flag within a JSON schema pattern (regular expression)?


Is there a way to set the u flag and thus enable unicode regex patterns?

I need to match names like Straßer, Müller, Adèle, Yiğit.

/\p{L}+/u or new RegExp('\\p{L}+', 'u') would work in my case if I could use plain JS in JSON schema.

The specification says

6.3.3. pattern
The value of this keyword MUST be a string. This string SHOULD be a valid regular expression, according to the ECMA-262 regular expression dialect.

I found this: How to match a Unicode letter with a JSON Schema pattern (regular expression) . The result is too obfuscating. JavaScript/ECMA Script can handle \p{L} as expected if the u flag is set.


Solution

  • The 2020-12 version of JSON Schema (which you reference) has an external more detailed changelog (informative), which details the following which may not be obvious from the specification itself...

    Regular expressions are now expected (but not strictly required) to support unicode characters. Previously, this was unspecified and implementations may or may not support this unicode in regular expressions. - https://json-schema.org/draft/2020-12/release-notes.html

    If you are using an implementation which supports JSON Schema draft 2020-12, you should be able to use unicode in regex, as that flag should be enabled.

    You cannot specify flags with the regular expression because the actual requirements for regular expression support are only SHOULD and not MUST. In the specification world, this means you cannot rely on this to be interoperable. If you only plan to use the schemas internally and you test it and it works (it should given it sounds like you're working with js/node), then you'll probably be OK, but sharing the schemas to others may not work as expected.

    Some implementations in other languages use a port of the ECMA-262 regular expression engine, but not all do, and sometimes there isn't a port avilable.