I use openpyxl for data processing.
First time, I write data copy code file to file (xlsx -> xlsx) for test.
# open input file
wb = load_workbook(filename='input.xlsx', read_only=True)
ws = wb['sheet name']
# create output file
out = Workbook(write_only=True)
out_sh = out.create_sheet('sheet name')
# copy data
for rows in ws:
out_sh.append(cells.value for cells in rows)
# save file
out.save('output.xlsx')
wb.close()
out.close()
input file size is 394 MB and there is 6 sheets.
5 sheets size is 1048576 rows by 17 (A~Q) columns.
My memory size is 8 GB and normally 66% in use (idle state, about 2.5 GB left).
last one sheet size is small. i tested for one large sheets.
In openpyxl I use the read_only and write_only options for optimization but when I run the data copy code for only 1 sheet (not whole 394MB file), I get a MemoryError.
If data analysis code is added, it will only be able to process smaller files.
Are there any optimization techniques to large file processing I haven't tried yet?
There are other threads with same error issue:
Memory Error Thread 1 Memory Error Thread 2
Does it help with your problem?