Search code examples
python-3.xopenpyxlxlrd

UnicodeDecodeError while converting .xls file to .xlsx with Python xlrd


I'm working to automate our monthly reporting process from a webpage. The download is in .xls format, and I am trying to convert it to .xlsx so I can manipulate it with openpyxl. The code downloads the Excel file to my computer, but I have not been able to get the file to open successfully using openpyxl or xlrd because I get a UnicodeDecodeError.

After reading a thread at Github I tried opening the file manually and re-running the code, and the file was able to open successfully. However as he says in the thread, having to open the file manually defeats the purpose of automating the process. Does anyone know how I can get past this?

This is the code that throws the error:

import xlrd, openpyxl

filePath = r'C:\Users\Daly_Llama'
downloadName = filePath + "All Endpoints and MCUs   " + today.strftime("%Y%m%d") + '.xls'

# open_xls_as_xlsx function adaptation, original code by Ray at https://stackoverflow.com/questions/9918646/how-to-convert-xls-to-xlsx 
def open_xls_as_xlsx(filename):
    # open xls file using xlrd
    xlsBook = xlrd.open_workbook(filename)
    index = 0
    nrows, ncols = 0, 0
    while nrows * ncols == 0:
        xlsSheet = xlsBook.sheet_by_index(index)
        nrows = xlsSheet.nrows
        ncols = xlsSheet.ncols
        index += 1
    # prepare a xlsx sheet
    xlsxBook = Workbook()
    xlsxSheet = xlsxBook.get_active_sheet()
    for row in xrange(0, nrows):
        for col in xrange(0, ncols):
            xlsxSheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col)
    return xlsxBook

workbook = open_xls_as_xlsx(downloadName)

This is the error I receive:

Traceback (most recent call last):
  File "C:\Users\Me\MonthlyReport.py", line 100, in <module>
    workbook = open_xls_as_xlsx(downloadName)
  File "C:\Users\Me\MonthlyReport.py", line 81, in open_xls_as_xlsx
    xlsBook = xlrd.open_workbook(filename)
  File "C:\Program Files\Python37\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
    ragged_rows=ragged_rows,
  File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 117, in open_workbook_xls
    bk.parse_globals()
  File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1227, in parse_globals
    self.handle_writeaccess(data)
  File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1192, in handle_writeaccess
    strg = unpack_unicode(data, 0, lenlen=2)
  File "C:\Program Files\Python37\lib\site-packages\xlrd\biffh.py", line 284, in unpack_unicode
    strg = unicode(rawstrg, 'utf_16_le')
  File "C:\Program Files\Python37\lib\site-packages\xlrd\timemachine.py", line 31, in <lambda>
    unicode = lambda b, enc: b.decode(enc)
  File "C:\Program Files\Python37\lib\encodings\utf_16_le.py", line 16, in decode
    return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x20 in position 108: truncated data

Solution

  • The workaround at this link remains as the only workable solution I have found. I raised an input command that pauses execution until the file has been manually opened, after which the script can continue.