Search code examples
pythonexcelunicodeasciixlwt

UnicodeDecodeError when trying to save an Excel File with Python xlwt


I'm running a Python script that writes HTML code found using BeautifulSoup into multiple rows of an Excel spreadsheet column.

[...]

Col_HTML = 19
w_sheet.write(row_index, Col_HTML, str(HTML_Code))
wb.save(output)

When trying to save the file, I get the following error message:

Traceback (most recent call last):
  File "C:\Users\[..]\src\MYCODE.py", line 201, in <module>
    wb.save(output)
  File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\Workbook.py", line 662, in save
    doc.save(filename, self.get_biff_data())
  File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\Workbook.py", line 637, in get_biff_data
    shared_str_table   = self.__sst_rec()
  File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\Workbook.py", line 599, in __sst_rec
    return self.__sst.get_biff_record()
  File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\BIFFRecords.py", line 76, in get_biff_record
    self._add_to_sst(s)
  File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\BIFFRecords.py", line 91, in _add_to_sst
    u_str = upack2(s, self.encoding)
  File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\UnicodeUtils.py", line 50, in upack2
    us = unicode(s, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 5181: ordinal not in range(128)

I've successfully written Python script in the past to write into worksheets. It's the first time I try to write a string of HTML into cells and I'm wondering what is causing the error and how I could fix it.


Solution

  • Use this line before passing HTML_Code to w_sheet.write

    HTML_Code = HTML_Code.decode('utf-8')
    

    Because, in the error line UnicodeDecodeError: 'ascii' codec can't decode, Python is trying to decode unicode into ascii, so you need to decode unicode using the proper encoding format, that is, utf-8.

    So, you have:

    Col_HTML = 19
    HTML_Code = HTML_Code.decode('utf-8')
    w_sheet.write(row_index, Col_HTML, str(HTML_Code))