Search code examples
pythonexcelxlsxopenpyxl

Trouble loading large excel file (Python)


I am trying to load a large .xlsx file using OpenPyXL, while loading a 80 MB .xlsx file my entire 8 GB of memory is getting full

enter image description here

Excel file has 4 sheets with 800 000 rows.

from openpyxl import load_workbook
wb = load_workbook('Record.xlsx')

def process(ws):
    '''
    Read all rows of a worksheet
    '''
    data = []
    for a, c, f, k in zip(ws['A'], ws['C'], ws['F'], ws['K']):
        data.append([a, c, f, k])

    return data

ws1 = wb.worksheets[0] # Sheet 1
data1 = process(ws1)
ws2 = wb.worksheets[1] # Sheet 2
data2 = process(ws2)
ws3 = wb.worksheets[2] # Sheet 3
data3 = process(ws3)
ws4 = wb.worksheets[3] # Sheet 4
data4 = process(ws4)

Why while loading 80 MB of excel file 8 GB of memory is not enough?


Solution

  • Try using the read_only = True argument to use read only mode, as described here.