Search code examples
pythonjsonpandasunicode

Writing pandas DataFrame to JSON in unicode


I'm trying to write a pandas DataFrame containing unicode to json, but the built in .to_json function escapes the non-ascii characters. How do I fix this?

Example:

import pandas as pd

df = pd.DataFrame([["τ", "a", 1], ["π", "b", 2]])
df.to_json("df.json")

This gives:

{"0":{"0":"\u03c4","1":"\u03c0"},"1":{"0":"a","1":"b"},"2":{"0":1,"1":2}}

Which differs from the desired result:

{"0":{"0":"τ","1":"π"},"1":{"0":"a","1":"b"},"2":{"0":1,"1":2}}

I have tried adding the force_ascii=False argument:

import pandas as pd

df = pd.DataFrame([["τ", "a", 1], ["π", "b", 2]])
df.to_json("df.json", force_ascii=False)

But this gives the following error:

UnicodeEncodeError: 'charmap' codec can't encode character '\u03c4' in position 11: character maps to <undefined>

This occurs on pandas versions 0.18 to 2.2+, on python 3.4 to 3.12+


Solution

  • Opening a file with the encoding set to utf-8, and then passing that file to the .to_json function fixes the problem:

    with open('df.json', 'w', encoding='utf-8') as file:
        df.to_json(file, force_ascii=False)
    

    gives the correct:

    {"0":{"0":"τ","1":"π"},"1":{"0":"a","1":"b"},"2":{"0":1,"1":2}}
    

    Note: it does still require the force_ascii=False argument.