I'm working to automate our monthly reporting process from a webpage. The download is in .xls format, and I am trying to convert it to .xlsx so I can manipulate it with openpyxl. The code downloads the Excel file to my computer, but I have not been able to get the file to open successfully using openpyxl or xlrd because I get a UnicodeDecodeError.
After reading a thread at Github I tried opening the file manually and re-running the code, and the file was able to open successfully. However as he says in the thread, having to open the file manually defeats the purpose of automating the process. Does anyone know how I can get past this?
This is the code that throws the error:
import xlrd, openpyxl
filePath = r'C:\Users\Daly_Llama'
downloadName = filePath + "All Endpoints and MCUs " + today.strftime("%Y%m%d") + '.xls'
# open_xls_as_xlsx function adaptation, original code by Ray at https://stackoverflow.com/questions/9918646/how-to-convert-xls-to-xlsx
def open_xls_as_xlsx(filename):
# open xls file using xlrd
xlsBook = xlrd.open_workbook(filename)
index = 0
nrows, ncols = 0, 0
while nrows * ncols == 0:
xlsSheet = xlsBook.sheet_by_index(index)
nrows = xlsSheet.nrows
ncols = xlsSheet.ncols
index += 1
# prepare a xlsx sheet
xlsxBook = Workbook()
xlsxSheet = xlsxBook.get_active_sheet()
for row in xrange(0, nrows):
for col in xrange(0, ncols):
xlsxSheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col)
return xlsxBook
workbook = open_xls_as_xlsx(downloadName)
This is the error I receive:
Traceback (most recent call last):
File "C:\Users\Me\MonthlyReport.py", line 100, in <module>
workbook = open_xls_as_xlsx(downloadName)
File "C:\Users\Me\MonthlyReport.py", line 81, in open_xls_as_xlsx
xlsBook = xlrd.open_workbook(filename)
File "C:\Program Files\Python37\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 117, in open_workbook_xls
bk.parse_globals()
File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1227, in parse_globals
self.handle_writeaccess(data)
File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1192, in handle_writeaccess
strg = unpack_unicode(data, 0, lenlen=2)
File "C:\Program Files\Python37\lib\site-packages\xlrd\biffh.py", line 284, in unpack_unicode
strg = unicode(rawstrg, 'utf_16_le')
File "C:\Program Files\Python37\lib\site-packages\xlrd\timemachine.py", line 31, in <lambda>
unicode = lambda b, enc: b.decode(enc)
File "C:\Program Files\Python37\lib\encodings\utf_16_le.py", line 16, in decode
return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x20 in position 108: truncated data
The workaround at this link remains as the only workable solution I have found. I raised an input command that pauses execution until the file has been manually opened, after which the script can continue.