Search code examples
pythonjsoncjk

Use python -m json.tool to output Chinese UTF-8


Original text file "chinese.txt" like following

{"type":"FeatureCollection","text":"你好"}

on Mac run command in Terminal like following

$ cat chinese.txt | python -m json.tool

The output is

{
    "text": "\u4f60\u597d",
    "type": "FeatureCollection"
}

How add parameter to avoid the "\u4f60\u597d" and get "你好"

What I like to do is the use python -m json.tool from the shell without modifying the code of json.tool. A common use case is to reformat a UTF-8 encoded json file, and keep the Chinese characters not like the \uxxxx.


Solution

  • As of Python 3.9, json.tool has a new --no-ensure-ascii option for this.

    Before 3.9, you'll need to use something else. This is (most of) the source for the 3.8 version of json.tool:

    prog = 'python -m json.tool'
    description = ('A simple command line interface for json module '
                   'to validate and pretty-print JSON objects.')
    parser = argparse.ArgumentParser(prog=prog, description=description)
    parser.add_argument('infile', nargs='?',
                        type=argparse.FileType(encoding="utf-8"),
                        help='a JSON file to be validated or pretty-printed',
                        default=sys.stdin)
    parser.add_argument('outfile', nargs='?',
                        type=argparse.FileType('w', encoding="utf-8"),
                        help='write the output of infile to outfile',
                        default=sys.stdout)
    parser.add_argument('--sort-keys', action='store_true', default=False,
                        help='sort the output of dictionaries alphabetically by key')
    parser.add_argument('--json-lines', action='store_true', default=False,
                        help='parse input using the jsonlines format')
    options = parser.parse_args()
    
    infile = options.infile
    outfile = options.outfile
    sort_keys = options.sort_keys
    json_lines = options.json_lines
    with infile, outfile:
        try:
            if json_lines:
                objs = (json.loads(line) for line in infile)
            else:
                objs = (json.load(infile), )
            for obj in objs:
                json.dump(obj, outfile, sort_keys=sort_keys, indent=4)
                outfile.write('\n')
        except ValueError as e:
            raise SystemExit(e)
    

    The problem is you have no way to add parameters to the call to json.dump - you'd want to do this instead:

    json.dump(obj, outfile, sort_keys=sort_keys, indent=4, ensure_ascii=False)
    

    But you'll have to write your own script for that, json.tool won't help you here.