I am using Python 2.7
& openpyxl==2.5.11
.
For files with valid workbook I am reading the content with this code:
wb = openpyxl.load_workbook('my_file.xlsx', read_only=True)
ws = wb.worksheets[1]
rows = ws.rows
Now, a tester provided an old file to me, which doesn't have valid workbook part and raises this error:
IOError: File contains no valid workbook part
in this line ws = wb.worksheets[1]
I couldn't find an example how to read data for such files in the openpyxl documentation.
I managed to read the file using pandas
:
data = pandas.read_excel('my_file.xlxs')
Is there a way to read the file with openpyxl
, because I already use openpyxl
in my app and wouldn't like to migrate completely to pandas
, nor use both in different places in my app. Ideally I would like to use single lib for Excel file manipulation across all application services and if possible I would like to avoid transitioning form openpyxl
to pandas
, because I cannot estimate how big effort that transition would require, nor whether all my use cases would work out of the box with pandas
.
Inspired by The Pineapple's answer and Jon Clements's comment, I came up with a solution which reads data into xlrd
workbook and then creates openpyxl
Workbook as a result of the function. This helps my use case, because I already use the openpyxl
Workbook in multiple places in my application. Here is the code:
import xlrd
from openpyxl import Workbook
def open_excel_without_workbook_part(filename):
xlrd_workbook = xlrd.open_workbook(filename=filename)
xlrd_worksheet = xlrd_workbook.sheet_by_index(0)
nrows = xlrd_worksheet.nrows
ncols = xlrd_worksheet.ncols
# create a xlsx file using openpyxl
openpyxl_book = Workbook()
openpyxl_worksheet = openpyxl_book.active
for row in xrange(0, nrows):
for col in xrange(0, ncols):
openpyxl_worksheet.cell(row=row + 1, column=col + 1).value = xlrd_worksheet.cell_value(row, col)
return openpyxl_book
if __name__ == '__main__':
workbook = open_excel_without_workbook_part('file-without-valid-workbook.xlsx')
sheet = workbook.worksheets[0]
for row in sheet.rows:
print '\t'.join([cell.value for cell in row])