Search code examples
pythoncsvunicodeutf-8xlrd

unicode error in list coming from xls going to a csv file


I'm trying to scrape an xls file from a page and save it to csv. I've created a function that handles the file, but when I try and write the file out I get an encoding error, saying that it can't handle one of the characters in the file.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 9: ordinal not in range(128)

I'm trying to enforce utf-8 across every item in the list using the unicode function, which seems to work on most items, but somehow I'm still getting the same error.

def csv_from_excel( fileName ):
# open the xls file and save as a csv

wb = xlrd.open_workbook(fileName)
worksheets = wb.sheet_names()
for worksheet_name in worksheets:
    worksheet = wb.sheet_by_name(worksheet_name)
    print worksheet
    csvfile = open(fileName, 'wb')
    wr = csv.writer(csvfile, quoting=csv.QUOTE_ALL)

    for rownum in xrange(worksheet.nrows):
        cleanrow = [unicode(i) for i in worksheet.row_values(rownum)]
        print cleanrow
        wr.writerow(cleanrow)

    csvfile.close()

Solution

  • In Python 2, you need to pass encoded strings to the csv module, not Unicode values.

    for rownum in xrange(worksheet.nrows):
        cleanrow = [unicode(i).encode('utf8') for i in worksheet.row_values(rownum)]
        print cleanrow
        wr.writerow(cleanrow)
    

    This encodes all values to UTF-8 before passing them on to the writer object.

    Calling unicode() on objects converts non-unicode values (byte strings, numbers, etc.) to a unicode object, not to UTF-8 encoded bytes. Only an explicit .encode() method call does that.