I want to write data from a Pandas dataframe to an Excel file. I am using pyExcelerate rather than the built in to_excel() method to get better performance.
While Pandas handles utf-8 encoded special characters, the encoding seems to be lost when converting to python list. I read through https://docs.python.org/2/howto/unicode.html, however I could not find out how to preserve the encoding. Or is there some way to restore encoding at the time of writing?
# -*- coding: utf-8 -*-
import pandas as pd
from pyexcelerate import Workbook
df = pd.DataFrame({'val': ['ä']})
print df
# val
# 0 ä <- non ascii character a with dots
value = df.values.tolist()
print value
# [['\xc3\xa4']]
wb = Workbook()
ws = wb.new_sheet("sheet name", data=value)
wb.save("output.xlsx")
# UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)
Simple, feed your Dataframe with Unicodes by using Unicode literals:
df = pd.DataFrame({u'val': [u'ä']})
note the u
prefixes
Ensure your file is encoded with UTF-8 (to match the # -*- coding: utf-8 -*-
header) (It does looks UTF-8 encoded as 0xc3
is the first byte of ä
)
The error is caused by Python not knowing the encoding of the byte string, when the string is marshalled into a Unicode object later.