Search code examples
jsonparsingunicodeutf-8whitespace

Definition of whitespace in JSON


JSON specifies that "Whitespace can be inserted between any pair of tokens." What it does not specify is exactly what whitespace is. Should I read this as "old-fashioned ASCII whitespace" or "the entire spectrum of Unicode whitespace"?

In other words, when parsing JSON, are U+2000, U+2001, U+FEFF etc. valid whitespace characters between tokens?


Solution

  • Insignificant whitespace is defined in the RFC4627 for JSON:

    Insignificant whitespace is allowed before or after any of the six
    structural characters.

      ws = *(
                %x20 /              ; Space
                %x09 /              ; Horizontal tab
                %x0A /              ; Line feed or New line
                %x0D                ; Carriage return
            )
    

    By the way, the default encoding is UTF-8:

    JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

    That being said, I don't think they intended to accept all forms of Unicode spaces in the original implementation.