Search code examples
pythonexcelxlrd

Why is xlrd reading the wrong number of columns?


When I read an .xls file, xlrd acts as if there was an empty column at position 0, and the last column (number 58, if counting from 0) is never read; the first column is skipped and the last one disappears.

The issue goes away when I save the file (either as an .xls or .xlsx) and do the whole process again. When saving, Excel tells me that there may be compatibility issues. I need to automate this process, which is why I'd like to avoid manually having to save the file.

# Loading the 'converted' file
wb = xlrd.open_workbook(converted_file_path) 
sheet = wb.sheet_by_index(0)

print('Row',len(sheet.row(3)),sheet.row(3))
# Prints: Row 59 [text:'Client Number', ...


# Loading the original file
wb = xlrd.open_workbook(original_file_path) 
print('Row',len(sheet.row(3)),sheet.row(3))
# Prints: Row 58 [empty:'', text:'Client Number', ...

print(sheet.cell_value(0,5))
# Prints nothing, while there is a number at cell (0,5)


Solution

  • Thanks to Martin for suggesting the win32com fix, but I found the actual problem. It has something to do with how xlrd has Python load up sheets in memory. Perhaps messing around with the on_demand parameter while loading the workbook does something, but anyhow, here's what was happening:

    I needed the sheet named "Report X" where X is some number, so I looked for it like this:

    for sheet in wb.sheets():
        if 'Report' in sheet.name:
            report_sheet = sheet
            break
    

    Then, I changed it to

    for sheet in wb.sheets():
        if 'Report' in sheet.name:
            report_sheet = sheet
            # removed the break
    

    And now it works.