Search code examples
pythonxlrd

How to write exact wordings of unicode characters into a file?


when I want to write "සිවු අවුරුදු පාටමාලාව" with the exact wording into a json file using python3.6, but instead \u0dc3\u0dd2\u0dc3\u0dd4\u0db1\u0dca\u0da7 \u0dc3\u0dd2\u0dc0\u0dd4 is written into the json file.

I read an excel using xlrd and write to using open().

import xlrd 
import json

wb = xlrd.open_workbook('data.xlsx',encoding_override='utf-8') 
sheet = wb.sheet_by_index(0) 

with open('data.json', 'w') as outfile:
    data = json.dump(outerdata,outfile,ensure_ascii=True)

Solution

  • If I do this in Python with the escape string you report:

    >>> print ("\u0dc3\u0dd2\u0dc3\u0dd4\u0db1\u0dca\u0da7 \u0dc3\u0dd2\u0dc0\u0dd4")
    සිසුන්ට සිවු
    

    you will see that the escapes do render as the characters you want. These are two different representations of the same data. Both representations are valid in JSON. But you are using json.dump() and you have specified ensure_ascii=True. That tells json.dump() that you want the representation with escapes. That is what ascii means: only the printable characters between chr(32) and chr(126). Change that to ensure_ascii=False.

    But because you are now no longer writing pure ascii to your output file data.json, you need to specify an encoding when you open it:

    with open("data.json", "w", encoding="utf-8") as outfile:
        data = json.dump(outerdata,outfile,ensure_ascii=False)
    

    This will make your JSON file look the way you want it to look.