Search code examples
pythonpython-3.xnumpymemorycalloc

How is memory handled once touched for the first time in numpy.zeros?


I recently saw that when creating a numpy array via np.empty or np.zeros, the memory of that numpy array is not actually allocated by the operating system as discussed in this answer (and this question), because numpy utilizes calloc to allocate the array's memory.

In fact, the OS isn't even "really" allocating that memory until you try to access it.

Therefore,

l = np.zeros(2**28)

does not increase the utilized memory the system reports, e.g., in htop. Only once I touch the memory, for instance by executing

np.add(l, 0, out=l)

the utilized memory is increased.

Because of that behaviour I have got a couple of questions:

1. Is touched memory copied under the hood?

If I touch chunks of the memory only after a while, is the content of the numpy array copied under the hood by the operating system to guarantee that the memory is contiguous?

i = 100
f[:i] = 3

while True:
    ... # Do stuff
    f[i] = ... # Once the memory "behind" the already allocated chunk of memory is filled
                # with other stuff, does the operating system reallocate the memory and
                # copy the already filled part of the array to the new location?
    i = i + 1

2. Touching the last element

As the memory of the numpy array is continguous in memory, I tought

f[-1] = 3

might require the enitre block of memory to be allocated (without touching the entire memory). However, it does not, the utilized memory in htop does not increase by the size of the array. Why is that not the case?


Solution

  • OS isn't even "really" allocating that memory until you try to access it

    This is dependent of the target platform (typically the OS and its configuration). Some platform directly allocates page in physical memory (eg. AFAIK the XBox does as well as some embedded platforms). However, mainstream platforms actually do that indeed.

    1. Is touched memory copied under the hood?
    If I touch chunks of the memory only after a while, is the content of the numpy array copied under the hood by the operating system to guarantee that the memory is contiguous?

    Allocations are perform in virtual memory. When a first touch is done on a given memory page (chunk of fixed sized, eg. 4 KiB), the OS maps the virtual page to a physical one. So only one page will be physically map when you set only one item of the array (unless the item cross two pages which only happens in pathological cases).

    The physical pages may not be contiguous for a contiguous set of virtual pages. However, this is not a problem and you should not care about it. This is mainly the job of the OS. That being said, modern processors have a dedicated unit called TLB to translate virtual address (the one you could see with a debugger) to physical ones (since this translation is relatively expensive and performance critical).

    virtual memory example

    The content of the Numpy array is not reallocated nor copied thanks to paging (at least from the user point-of-view, ie. in virtual memory).

    2. Touching the last element
    I thought f[-1] = 3 might require the entire block of memory to be allocated (without touching the entire memory). However, it does not, the utilized memory in htop does not increase by the size of the array. Why is that not the case?

    Only the last page in virtual memory associated to the Numpy array is mapped thanks to paging. This is why you do not see a big change in htop. However, you should see a slight change (the size of a page on your platform) if you look carefully. Otherwise, this should mean the page has been already mapped due to other previous recycled allocations. Indeed, the allocation library can preallocate memory area to speed up allocations (by reducing the number of slow requests to the OS). The library could also keep the memory mapped when it is freed by Numpy in order to speed the next allocations up (since the memory do not have to be unmapped to be then mapped again). This is unlikely to occur for huge arrays in practice because the impact on memory consumption would be too expensive.