Search code examples
jsoncontent-encoding

Defining text encoding in a file containing JSON


My application stores configuration data (including strings for the UI) in a text file containing JSON. For example, config.json might contain the following:

{
   "CustomerName" : "Omni Consumer Products",
   "SubmitButtonText": "Click here to submit",
   // etc etc etc..
}

This file goes to our translation vendor, who makes duplicates of it in multiple supported languages. They might be building their own app, or they might be editing it in a text editor. I don't know.

Since we're going to be using all manner of non-ASCII characters in some of our languages, I'd like to ensure everybody is clear on what character encoding we're using.

So if this were an XML file, I would stick the following declaration at the top of the file:

<?xml version="1.0" encoding="UTF-8"?>

Any reasonable text editor or XML parser will see this and know that the file is encoded in UTF-8.

Is there any similar standard I can put at the top of a JSON file, and be reasonably assured that consumers will play nicely with it?


Solution

  • JSON's default encoding is UTF-8:

    http://www.ietf.org/rfc/rfc4627.txt

    From section 3:

    JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

    Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

    This determination is unambiguous so there is no special place where an encoding is described in the format itself.