I'm trying to scrape an xls file from a page and save it to csv. I've created a function that handles the file, but when I try and write the file out I get an encoding error, saying that it can't handle one of the characters in the file.
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 9: ordinal not in range(128)
I'm trying to enforce utf-8 across every item in the list using the unicode function, which seems to work on most items, but somehow I'm still getting the same error.
def csv_from_excel( fileName ):
# open the xls file and save as a csv
wb = xlrd.open_workbook(fileName)
worksheets = wb.sheet_names()
for worksheet_name in worksheets:
worksheet = wb.sheet_by_name(worksheet_name)
print worksheet
csvfile = open(fileName, 'wb')
wr = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
for rownum in xrange(worksheet.nrows):
cleanrow = [unicode(i) for i in worksheet.row_values(rownum)]
print cleanrow
wr.writerow(cleanrow)
csvfile.close()
In Python 2, you need to pass encoded strings to the csv
module, not Unicode values.
for rownum in xrange(worksheet.nrows):
cleanrow = [unicode(i).encode('utf8') for i in worksheet.row_values(rownum)]
print cleanrow
wr.writerow(cleanrow)
This encodes all values to UTF-8 before passing them on to the writer object.
Calling unicode()
on objects converts non-unicode values (byte strings, numbers, etc.) to a unicode
object, not to UTF-8 encoded bytes. Only an explicit .encode()
method call does that.