Search code examples
pythonxlrd

python xlrd convert xlsx to csv


I am trying to convert excel files to csv files using xlrd library.

But I got this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 2: ordinal not in range(128)

Can it be because the excel file is too large? Cause everything works fine with excel files that have small number of rows. But when I tried to convert an excel file that has nearly 2000 rows, I got this error.

[UPDATE]

This is the code:

filepath = './attachments'
wb = xlrd.open_workbook(os.path.join(filepath, 'result.xls'))
sheet = wb.sheet_by_index(0)
fp = open(os.path.join(filepath, 'result.csv'), 'wb')
wr = csv.writer(fp, quoting=csv.QUOTE_ALL)
for rownum in xrange(sheet.nrows):
  wr.writerow(sheet.row_values(rownum))
fp.close()

And this is the traceback:

Traceback (most recent call last):
  File "MethodTest.py", line 11, in <module>
    wr.writerow(sheet.row_values(rownum))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 2: ordinal not in range(128)

Solution

  • You are reading a Excel sheet with data outside of the ASCII range.

    When writing unicode values to a CSV file, automatic encoding takes place, but for values outside the ASCII range of characters that fails. Encode explicitly:

    for rownum in xrange(sheet.nrows):
          wr.writerow([unicode(val).encode('utf8') for val in sheet.row_values(rownum)])
    

    On Python 3, the CSV module supports Unicode text, provided you provided the open() command with a suitable encoding and opened the file in text mode. While the default can be UTF-8, it is better to be explicit. The documentation recommends you use newline='' when opening a file for CSV writing:

    fp = open(os.path.join(filepath, 'result.csv'), 'w', encoding='utf-8', newline='')
    wr = csv.writer(fp, quoting=csv.QUOTE_ALL)
    for rownum in range(sheet.nrows):
          wr.writerow(sheet.row_values(rownum))
    

    You may need to pick a different encoding, depending on your needs.

    It's also a good idea to use the file object as a context manager to make sure it is closed after the code exits (with or without an exception):

    with open(os.path.join(filepath, 'result.csv'), 'w', encoding='utf-8', newline='') as fp:
        wr = csv.writer(fp, quoting=csv.QUOTE_ALL)
        for rownum in range(sheet.nrows):
              wr.writerow(sheet.row_values(rownum))