My application stores configuration data (including strings for the UI) in a text file containing JSON. For example, config.json might contain the following:
{
"CustomerName" : "Omni Consumer Products",
"SubmitButtonText": "Click here to submit",
// etc etc etc..
}
This file goes to our translation vendor, who makes duplicates of it in multiple supported languages. They might be building their own app, or they might be editing it in a text editor. I don't know.
Since we're going to be using all manner of non-ASCII characters in some of our languages, I'd like to ensure everybody is clear on what character encoding we're using.
So if this were an XML file, I would stick the following declaration at the top of the file:
<?xml version="1.0" encoding="UTF-8"?>
Any reasonable text editor or XML parser will see this and know that the file is encoded in UTF-8.
Is there any similar standard I can put at the top of a JSON file, and be reasonably assured that consumers will play nicely with it?
JSON's default encoding is UTF-8:
http://www.ietf.org/rfc/rfc4627.txt
From section 3:
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.
This determination is unambiguous so there is no special place where an encoding is described in the format itself.