The following code executed in order to load several large (~200MB) JSON files:
def work():
jsons = get_data()
# do something with the jsons
def get_data():
json_files = []
for json_path in file_paths_list:
json_files.append(load_json(json_path))
return json_files
def load_json(json_path):
import json
with open(json_path) as f:
return json.load(f)
This is how Pycharm's custom VM options look (up to 30GB heap size, RAM is 32GB):
# custom PyCharm VM options
-Xms25000m
-Xmx30000m
...
...
...
Popular recommendation to "Invalidate Caches/Restart" already applied.
After loading 2 files (total of ~400MB), during the 3rd, exception "MemoryError" thrown.
I cannot understand why if I have up to 30GB heap size, the memory error is thrown after only 400MB?
PyCharm is the Python IDE, not the Python interpreter. The memory it uses is for the editing phase.
400MB of files may well expand to several gigabytes of data (maybe not 30 but 3 or 4), because of the overhead of python objects. Example:
>>> s = "hello"
>>> import sys
>>> sys.getsizeof(s)
54
basically the size of the object in ram is much higher than the size of the string.
So if your python interpreter is a 32-bit interpreter, you have a 2GB or 3GB limit, which can explain this. PyCharm uses a 64-bit core but is unable to help with the interpreter part.
Upgrade to a 64 bit interpreter, which is able to benefit of all your RAM.
You can check the version info & 32/64 bit info with this (from Pycharm):
>>> import sys
>>> sys.version
For instance I get:
('3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit '
'(AMD64)]')
If it shows "32 bit", my guess is correct. So uninstall the 32 bit version & just install the same version of python, but in 64 bit, and select it as the current interpreter in pycharm.
You may need to install additional modules in the new installation, so better dump the requirement textfile before uninstalling to be able to perform a global pip install
on the new 64 bit version.