Search code examples
pythonnumpymemory-profiling

Strange Python memory allocation


While trying to figure out how Python's garbage collection system works, I stumbled across this oddity. Running this simple code:

import numpy as np
from memory_profiler import profile

@profile
def my_func():
    a = np.random.rand(1000000)
    a = np.append(a, [1])
    a = np.append(a, [2])
    a = np.append(a, [3])
    a = np.append(a, [4])
    a = np.append(a, [5])
    b = np.append(a, [6])
    c = np.append(a, [7])
    d = np.append(a, a)

    return a

if __name__ == '__main__':
    my_func()

using memory_profiler version 0.52 and Python 3.7.6 on my MacBook, I got the following output:

Line #    Mem usage    Increment   Line Contents
================================================
     4     54.2 MiB     54.2 MiB   @profile
     5                             def my_func():
     6     61.8 MiB      7.7 MiB       a = np.random.rand(1000000)
     7     69.4 MiB      7.6 MiB       a = np.append(a, [1])
     8     69.4 MiB      0.0 MiB       a = np.append(a, [2])
     9     69.4 MiB      0.0 MiB       a = np.append(a, [3])
    10     69.4 MiB      0.0 MiB       a = np.append(a, [4])
    11     69.4 MiB      0.0 MiB       a = np.append(a, [5])
    12     69.4 MiB      0.0 MiB       b = np.append(a, [6])
    13     77.1 MiB      7.6 MiB       c = np.append(a, [7])
    14     92.3 MiB     15.3 MiB       d = np.append(a, a)
    15                             
    16     92.3 MiB      0.0 MiB       return a

Two things are odd. First, why is line 7 giving any more noticeable increase in memory than lines 8-11? Second, why isn't line 12 giving the same increase in memory as line 13?

Note that if I delete lines 12-14, I still get the increase in memory in line 7. So it's not a bug where the memory is actually being increased in line 12 but memory_profiler is incorrectly showing that increase in line 7.


Solution

  • creating a makes an array with 8e6 bytes (check `a.nybtes)

     6     61.8 MiB      7.7 MiB       a = np.random.rand(1000000)
    

    np.append makes a new array (it is concatenate, not list append), so we get another 8MB increase.

     7     69.4 MiB      7.6 MiB       a = np.append(a, [1])
    

    my guess is that in following steps it cycles back and forth using those two 8MB blocks. numpy doesn't return (to the OS) every free block.

    Then you assign the new array to c. a still exists, along with b. (I missed b the first time I looked at this.)

    13     77.1 MiB      7.6 MiB       c = np.append(a, [7])
    

    d is twice the size of a, so that accounts for the 15MB jump. a,b,c still exist.

    14     92.3 MiB     15.3 MiB       d = np.append(a, a)
    

    b and c are just one or two numbers bigger than a - so each takes up about 8MB. That seems to account for everything!

    When tracking memory use, keep in mind that numpy, python and the OS all play role. Most of us don't know all the details, so we can only make rough guesses as to what's happening.