Search code examples
pythonexcelpandasopenpyxlxlsx

Is there a way to read Excel file without workbook with openpyxl?


I am using Python 2.7 & openpyxl==2.5.11.

For files with valid workbook I am reading the content with this code:

wb = openpyxl.load_workbook('my_file.xlsx', read_only=True)
ws = wb.worksheets[1]
rows = ws.rows

Now, a tester provided an old file to me, which doesn't have valid workbook part and raises this error:

IOError: File contains no valid workbook part

in this line ws = wb.worksheets[1]

I couldn't find an example how to read data for such files in the openpyxl documentation.

I managed to read the file using pandas: data = pandas.read_excel('my_file.xlxs')

Is there a way to read the file with openpyxl, because I already use openpyxl in my app and wouldn't like to migrate completely to pandas, nor use both in different places in my app. Ideally I would like to use single lib for Excel file manipulation across all application services and if possible I would like to avoid transitioning form openpyxl to pandas, because I cannot estimate how big effort that transition would require, nor whether all my use cases would work out of the box with pandas.


Solution

  • Inspired by The Pineapple's answer and Jon Clements's comment, I came up with a solution which reads data into xlrd workbook and then creates openpyxl Workbook as a result of the function. This helps my use case, because I already use the openpyxl Workbook in multiple places in my application. Here is the code:

    import xlrd
    from openpyxl import Workbook
    
    
    def open_excel_without_workbook_part(filename):
        xlrd_workbook = xlrd.open_workbook(filename=filename)
        xlrd_worksheet = xlrd_workbook.sheet_by_index(0)
        nrows = xlrd_worksheet.nrows
        ncols = xlrd_worksheet.ncols
    
        # create a xlsx file using openpyxl
        openpyxl_book = Workbook()
        openpyxl_worksheet = openpyxl_book.active
    
        for row in xrange(0, nrows):
            for col in xrange(0, ncols):
                openpyxl_worksheet.cell(row=row + 1, column=col + 1).value = xlrd_worksheet.cell_value(row, col)
    
        return openpyxl_book
    
    
    if __name__ == '__main__':
        workbook = open_excel_without_workbook_part('file-without-valid-workbook.xlsx')
    
        sheet = workbook.worksheets[0]
        for row in sheet.rows:
            print '\t'.join([cell.value for cell in row])