I am trying to read images in numpy array images
the following way:
import os
import psutil
import numpy as np
from skimage.transform import resize
from skimage.io import imread
img_names = sorted(glob(my_dir + '/' + '*.jpg'))
images = np.empty((len(img_names),) + basic_shape, dtype='float32')
process = psutil.Process(os.getpid())
print('MEM:', process.memory_info().rss)
for i in range(0, len(img_names)):
images[i] = resize(imread(img_names[i]), basic_shape)
process = psutil.Process(os.getpid())
print('MEM:', process.memory_info().rss)
Before I start reading process.memory_info().rss
says that process is using 926846976 bytes
. After reading the process is using 2438307840 bytes
which is approximately 2.5 times more than before.
Why after reading the process memory is increasing that much and is there any way to reduce the size of allocated memory when reading images?
CPython is garbage collected, which means that objects may live in memory for a bit longer after they are no longer needed, and eventually they are removed from memory in batch.
You can play with the garbage collection by using the gc
module, and specifically calling gc.collect()
in your loop might help. See this other question for more:
How can I explicitly free memory in Python?
Note the many comments under the top answer, which I would summarise as "your mileage may vary".
Anyway, here's how I would modify your script first for diagnosis, and then to keep the memory use as low as possible:
import gc
import os
import psutil
import numpy as np
from skimage.transform import resize
from skimage.io import imread
img_names = sorted(glob(my_dir + '/' + '*.jpg'))
images = np.empty((len(img_names),) + basic_shape, dtype='float32')
process = psutil.Process(os.getpid())
print('MEM:', process.memory_info().rss)
for i in range(0, len(img_names)):
tmp = imread(img_names[i])
images[i] = resize(tmp, basic_shape)
# del tmp
# gc.collect() # try these two lines
process = psutil.Process(os.getpid())
print('MEM:', process.memory_info().rss)
gc.collect() # force garbage collection
process = psutil.Process(os.getpid())
print('MEM:', process.memory_info().rss)
I also recommend the memory_profiler
package to monitor your scripts' memory usage over time as well as per line.