Search code examples
pythonnumpymemory-managementscikit-image

numpy memory management when copying arrays


I am trying to read images in numpy array images the following way:

 import os
 import psutil
 import numpy as np
 from skimage.transform import resize
 from skimage.io import imread

 img_names = sorted(glob(my_dir + '/' + '*.jpg'))
 images = np.empty((len(img_names),) + basic_shape, dtype='float32')

 process = psutil.Process(os.getpid())
 print('MEM:', process.memory_info().rss)
    
 for i in range(0, len(img_names)):
    images[i] = resize(imread(img_names[i]), basic_shape)

 process = psutil.Process(os.getpid())
 print('MEM:', process.memory_info().rss)

Before I start reading process.memory_info().rss says that process is using 926846976 bytes. After reading the process is using 2438307840 bytes which is approximately 2.5 times more than before.

Why after reading the process memory is increasing that much and is there any way to reduce the size of allocated memory when reading images?


Solution

  • CPython is garbage collected, which means that objects may live in memory for a bit longer after they are no longer needed, and eventually they are removed from memory in batch.

    You can play with the garbage collection by using the gc module, and specifically calling gc.collect() in your loop might help. See this other question for more:

    How can I explicitly free memory in Python?

    Note the many comments under the top answer, which I would summarise as "your mileage may vary".

    Anyway, here's how I would modify your script first for diagnosis, and then to keep the memory use as low as possible:

    import gc
    import os
    import psutil
    import numpy as np
    from skimage.transform import resize
    from skimage.io import imread
    
    img_names = sorted(glob(my_dir + '/' + '*.jpg'))
    images = np.empty((len(img_names),) + basic_shape, dtype='float32')
    
    process = psutil.Process(os.getpid())
    print('MEM:', process.memory_info().rss)
        
    for i in range(0, len(img_names)):
        tmp = imread(img_names[i])
        images[i] = resize(tmp, basic_shape)
        # del tmp
        # gc.collect()  # try these two lines
    
    process = psutil.Process(os.getpid())
    print('MEM:', process.memory_info().rss)
    
    gc.collect()  # force garbage collection
    
    process = psutil.Process(os.getpid())
    print('MEM:', process.memory_info().rss)
    

    I also recommend the memory_profiler package to monitor your scripts' memory usage over time as well as per line.