Search code examples
pythonexcelpandasdataframeio

Memory & Value error when Pandas save to new file


Here are some simple lines of a script. This is to remove some columns from an Excel file and save it to a new file.

import pandas as pd
import numpy as np

work_file = "C:\\My Documents\\the_file.xlsx"
df = pd.read_excel(work_file, sheet_name = "Sheet1", index_col = 0)

column_list_to_remove = ["Name","Gender","Register"]

results1 = df.drop(column_list_to_remove, axis=1)

writer = pd.ExcelWriter("C:\\My Documents\\new-file.xlsx")
results1.to_excel(writer,'Sheet1')

writer.save()

It had been working well on an old computer, both small and big (thousand rows) Excel files.

I have now upgraded to a new computer with bigger RAM (16 GB). When I run this script, it was well on a small file (a few thousand lines). But when it runs on the bigger file (hundred thousand rows Excel), it gives me below error message.

How can I correct this?

Error message:

Traceback (most recent call last):
  File "C:\Python38\lib\xml\etree\ElementTree.py", line 832, in _get_writer
    yield file.write
  File "C:\Python38\lib\xml\etree\ElementTree.py", line 772, in write
    serialize(write, self._root, qnames, namespaces,
  File "C:\Python38\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
    _serialize_xml(write, e, qnames, None,
  File "C:\Python38\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
    _serialize_xml(write, e, qnames, None,
  File "C:\Python38\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
    _serialize_xml(write, e, qnames, None,
  File "C:\Python38\lib\xml\etree\ElementTree.py", line 931, in _serialize_xml
    write(" %s=\"%s\"" % (qnames[k], v))
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\My Documents\my_script.py", line 9, in <module>
    writer.save()
  File "C:\Python38\lib\site-packages\pandas\io\excel\_openpyxl.py", line 43, in save
    return self.book.save(self.path)
  File "C:\Python38\lib\site-packages\openpyxl\workbook\workbook.py", line 392, in save
    save_workbook(self, filename)
  File "C:\Python38\lib\site-packages\openpyxl\writer\excel.py", line 293, in save_workbook
    writer.save()
  File "C:\Python38\lib\site-packages\openpyxl\writer\excel.py", line 275, in save
    self.write_data()
  File "C:\Python38\lib\site-packages\openpyxl\writer\excel.py", line 75, in write_data
    self._write_worksheets()
  File "C:\Python38\lib\site-packages\openpyxl\writer\excel.py", line 215, in _write_worksheets
    self.write_worksheet(ws)
  File "C:\Python38\lib\site-packages\openpyxl\writer\excel.py", line 200, in write_worksheet
    writer.write()
  File "C:\Python38\lib\site-packages\openpyxl\worksheet\_writer.py", line 360, in write
    self.close()
  File "C:\Python38\lib\site-packages\openpyxl\worksheet\_writer.py", line 368, in close
    self.xf.close()
  File "C:\Python38\lib\site-packages\openpyxl\worksheet\_writer.py", line 299, in get_stream
    pass
  File "C:\Python38\lib\contextlib.py", line 120, in __exit__
    next(self.gen)
  File "C:\Python38\lib\site-packages\et_xmlfile\xmlfile.py", line 50, in element
    self._write_element(el)
  File "C:\Python38\lib\site-packages\et_xmlfile\xmlfile.py", line 77, in _write_element
    xml = tostring(element)
  File "C:\Python38\lib\xml\etree\ElementTree.py", line 1133, in tostring
    ElementTree(element).write(stream, encoding,
  File "C:\Python38\lib\xml\etree\ElementTree.py", line 772, in write
    serialize(write, self._root, qnames, namespaces,
  File "C:\Python38\lib\contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Python38\lib\xml\etree\ElementTree.py", line 832, in _get_writer
    yield file.write
  File "C:\Python38\lib\contextlib.py", line 525, in __exit__
    raise exc_details[1]
  File "C:\Python38\lib\contextlib.py", line 510, in __exit__
    if cb(*exc_details):
  File "C:\Python38\lib\contextlib.py", line 382, in _exit_wrapper
    callback(*args, **kwds)
ValueError: I/O operation on closed file.

Solution

  • Replace your last three lines of code with the following:

    with pd.ExcelWriter("C:\\My Documents\\new-file.xlsx") as writer:
        results1.to_excel(writer)