Search code examples
python-3.xoptimizationioopenpyxl

openpyxl, handle large Excel files


I use openpyxl for data processing.

First time, I write data copy code file to file (xlsx -> xlsx) for test.

# open input file
wb = load_workbook(filename='input.xlsx', read_only=True)
ws = wb['sheet name']

# create output file
out = Workbook(write_only=True)
out_sh = out.create_sheet('sheet name')

# copy data
for rows in ws:
   out_sh.append(cells.value for cells in rows)

# save file
out.save('output.xlsx')
wb.close()
out.close()

input file size is 394 MB and there is 6 sheets.
5 sheets size is 1048576 rows by 17 (A~Q) columns.
My memory size is 8 GB and normally 66% in use (idle state, about 2.5 GB left).
last one sheet size is small. i tested for one large sheets.

In openpyxl I use the read_only and write_only options for optimization but when I run the data copy code for only 1 sheet (not whole 394MB file), I get a MemoryError.

If data analysis code is added, it will only be able to process smaller files.

Are there any optimization techniques to large file processing I haven't tried yet?


Solution

  • There are other threads with same error issue:

    Memory Error Thread 1 Memory Error Thread 2

    Does it help with your problem?