Search code examples
jsonencodingmicroservices

How do I discover the encoding of a JSON message?


JSON's official specification says:

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and...

So, essentially the JSON message can come in any of those three encodings. But... how do I guess which one is it when I receive it?

The message can come from multiple sources, such as a queue, from the browser, from the database, the file system, etc.

It also says to ignore Byte Order Masks (BOM):

...implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

I remember XML docs had a "prolog" that specified the encoding, but I can't find anything similar for JSON messages.

Any ideas?


Solution

  • rsp and CouchDeveloper have covered this pretty well with their answers (I can't take credit for those).

    Both answers look at the byte patterns to determine what encoding has been used. Apologies this doesn't directly answer your question, but it may help you to write an implementation of your own.