Search code examples
pythonexcelpandaspycharm

Incorrect character in saving process for Excel


I'm creating a new column and this new file and want to save. But in there excel file a column have a character. How can I skip this line the save process or change line to a correct character?

import pandas as pd

path = '/My Documents/Python/'
fileName = "test.xlsx"

# open the Excel file
ef = pd.ExcelFile(path+fileName)

# read the contents
df = pd.read_excel(path+fileName, sheet_name=ef.sheet_names[0])
print(df['Content'])
print(df['Engine'])

i = 1
for test in df['Content']:
    try:
        print(i)
        print(test)
    except:
        print("An exception occurred")
        break
    i += 1

df['Test'] = 'value'
df.to_excel('My Documents/Python/Test_NEW.xlsx')

Error message

data, consumed = self.encode(object, self.errors)
    UnicodeEncodeError: 'utf-8' codec can't encode character '\ude7c' in position 470: surrogates not allowed

Solution

  • df['Content'] = df['Content'].astype(str)