Search code examples
pythonexcelpandasunicodedata-analysis

How to convert or decode the Unicode characters in pandas DataFrame?


I was reading some data from an excel file using pandas and did some row traversal to make a python dictionary which i then put into a json file .

The problem is that I am getting Unicode characters in the json file itself :

"C V M College of Fine Arts,\u00a0 Vallabh Vidyanagar"

As shown above I don't want the Unicode '\u00a0' and instead want the decoded character representation of it in my json file.

Is there any way I can do this conversion (or filtering or whatever its called) while reading the excel file from the pandas itself ? or is there any way to do this while writing to json file using json.dump()?


Solution

  • Use json.dumps(..., ensure_ascii=False):

    foo = "C V M College of Fine Arts,\u00a0 Vallabh Vidyanagar"
    
    import json
    print(json.dumps({'foo':foo}, ensure_ascii=False))
    

    returns:

    {"foo": "C V M College of Fine Arts,  Vallabh Vidyanagar"}