Search code examples
pythonmemory-managementcpythonpython-internals

CPython memory allocation


This is a post inspired from this comment about how memory is allocated for objects in CPython. Originally, this was in the context of creating a list and appending to it in a for loop vis a vis a list comprehension.

So here are my questions:

  1. how many different allocaters are there in CPython?
    • what is the function of each?
  2. when is malloc acutally called? (a list comprehension may not result in a call to malloc, based on what's said in this comment
  3. How much memory does python allocate for itself at startup?
    1. are there rules governing which data structures get first "dibs" on this memory?
  4. What happens to the memory used by an object when it is deleted (does python still hold on to the memory to allocate to another object in the future, or does the GC free up the memory for another process, say Google Chrome, to use)?
  5. When is a GC triggered?
  6. lists are dynamic arrays, which means they need a contiguous piece of memory. This means that if I try to append an object into a list, whose underlying-C-data-structure array cannot be extended, the array is copied over onto a different part of memory, where a larger contiguous block is available. So how much space is allocated to this array when I initialize a list?
    • how much extra space is allocated to the new array, which now holds the old list and the appended object?

EDIT: From the comments, I gather that there are far too many questions here. I only did this because these questions are all pretty related. Still, I'd be happy to split this up into several posts if that is the case (please let me know to do so in the comments)


Solution

  • Much of this is answered in the Memory Management chapter of the C API documentation.

    Some of the documentation is vaguer than you're asking for. For further details, you'd have to turn to the source code. And nobody's going to be willing to do that unless you pick a specific version. (At least 2.7.5, pre-2.7.6, 3.3.2, pre-3.3.3, and pre-3.4 would be interesting to different people.)

    The source to the obmalloc.c file is a good starting place for many of your questions, and the comments at the top have a nice little ASCII-art graph:

        Object-specific allocators
        _____   ______   ______       ________
       [ int ] [ dict ] [ list ] ... [ string ]       Python core         |
    +3 | <----- Object-specific memory -----> | <-- Non-object memory --> |
        _______________________________       |                           |
       [   Python`s object allocator   ]      |                           |
    +2 | ####### Object memory ####### | <------ Internal buffers ------> |
        ______________________________________________________________    |
       [          Python`s raw memory allocator (PyMem_ API)          ]   |
    +1 | <----- Python memory (under PyMem manager`s control) ------> |   |
        __________________________________________________________________
       [    Underlying general-purpose allocator (ex: C library malloc)   ]
     0 | <------ Virtual memory allocated for the python process -------> |
    
       =========================================================================
        _______________________________________________________________________
       [                OS-specific Virtual Memory Manager (VMM)               ]
    -1 | <--- Kernel dynamic storage allocation & management (page-based) ---> |
        __________________________________   __________________________________
       [                                  ] [                                  ]
    -2 | <-- Physical memory: ROM/RAM --> | | <-- Secondary storage (swap) --> |
    

    how many different allocaters are there in CPython?

    According to the docs, "several". You could count up the ones in the builtin and stdlib types, then add the handful of generic ones, if you really wanted. But I'm not sure what it would tell you. (And it would be pretty version-specific. IIRC, the exact number even changed within the 3.3 tree, as there was an experiment with whether the new-style strings should use three different allocators or one.)


    what is the function of each?

    The object-specific allocators at level +3 are for specific uses cases that are worth optimizing. As the docs say:

    For example, integer objects are managed differently within the heap than strings, tuples or dictionaries because integers imply different storage requirements and speed/space tradeoffs.

    Below that, there are various generic supporting allocators at level +2 (and +1.5 and maybe +2.5)—at least an object allocator, an arena allocator, and a small-block allocator, etc.—but all but the first are private implementation details (meaning private even to the C-API; obviously all of it is private to Python code).

    And below that, there's the raw allocator, whose function is to ask the OS for more memory when the higher-level allocators need it.


    when is malloc acutally called?

    The raw memory allocator (or its heap manager) should be the only thing that ever calls malloc. (In fact, it might not even necessarily call malloc; it might use functions like mmap or VirtualAlloc instead. But the point is that it's the only thing that ever asks the OS for memory.) There are a few exceptions within the core of Python, but they'll rarely be relevant.

    The docs explicitly say that higher-level code should never try to operate on Python objects in memory obtained from malloc.

    However, there are plenty of stdlib and extension modules that use malloc for purposes besides Python objects.

    For example, a numpy array of 1000x1000 int32 values doesn't allocate 1 million Python ints, so it doesn't have to go through the int allocator. Instead, it just mallocs an array of 1 million C ints, and wraps them up in Python objects as needed when you access them.


    How much memory does python allocate for itself at startup?

    This is platform-specific, and a bit hard to figure out from the code. However, when I launch a new python3.3 interpreter on my 64-bit Mac, it starts of with 13.1MB of virtual memory, and almost immediately expands to 201MB. So, that should be a rough ballpark guide.

    are there rules governing which data structures get first "dibs" on this memory?

    Not really, no. A malicious or buggy object-specific allocator could immediately grab all of the pre-allocated memory and more, and there's nothing to stop it.


    What happens to the memory used by an object when it is deleted (does python still hold on to the memory to allocate to another object in the future, or does the GC free up the memory for another process, say Google Chrome, to use)?

    It goes back to the object-specific allocator, which may keep it on a freelist, or release it to the raw allocator, which keeps its own freelist. The raw allocator almost never releases memory back to the OS.

    This is because there's usually no good reason to release memory back to a modern OS. If you have a ton of unused pages lying around, the OS's VM will just page them out if another process needs it. And when there is a good reason, it's almost always application-specific, and the simplest solution is to use multiple processes to manage your huge short-lived memory requirements.


    When is a GC triggered?

    It depends on what you mean by "a GC".

    CPython uses refcounting; every time you release a reference to an object (by rebinding a variable or a slot in a collection, letting a variable go out of scope, etc.), if it was the last reference, it will be cleaned up immediately. This is explained in the Reference Counting section in the docs.

    However, there's a problem with refcounting: if two objects reference each other, even when all outside references go away, they still won't get cleaned up. So, CPython has always had a cycle collector that periodically walks objects looking for cycles of objects that reference each other, but have no outside references. (It's a little more complicated, but that's the basic idea.) This is fully explained in the docs for the gc module. The collector can run when you ask it to explicitly, when the freelists are getting low, or when it hasn't run in a long time; this is dynamic and to some extent configurable, so it's hard to give a specific answer to "when".


    lists are dynamic arrays, which means they need a contiguous piece of memory. This means that if I try to append an object into a list, whose underlying-C-data-structure array cannot be extended, the array is copied over onto a different part of memory, where a larger contiguous block is available. So how much space is allocated to this array when I initialize a list?

    The code for this is mostly inside listobject.c. It's complicated; there are a bunch of special cases, like the code used by timsort for creating temporary intermediate lists and for non-in-place sorting. But ultimately, some piece of code decides it needs room for N pointers.

    It's also not particularly interesting. Most lists are either never expanded, or expanded far beyond the original size, so doing extra allocation at the start wastes memory for static lists and doesn't help much for most growing lists. So, Python plays it conservative. I believe it starts by looking through its internal freelist that's not too much bigger than N pointers (it might also consolidate adjacent freed list storage; I don't know if it does), so it might overallocate a little bit occasionally, but generally it doesn't. The exact code should be in PyList_New.

    At any rate, if there's no space in the list allocator's freelist, it drops down to the object allocator, and so on through the levels; it may end up hitting level 0, but usually it doesn't.

    how much extra space is allocated to the new array, which now holds the old list and the appended object?

    This is handled in list_resize, and this is the interesting part.

    The only way to avoid list.append being quadratic is to over allocate geometrically. Overallocating by too small of a factor (like 1.2) wastes way too much time for the first few expansions; using too large of a factor (like 1.6) wastes way too much space for very large arrays. Python handles this by using a sequence that starts off at 2.0 but quickly converges toward somewhere around 1.25. According to the 3.3 source:

    The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...


    You didn't ask specifically about sorted, but I know that's what prompted you.

    Remember that timsort is primarily a merge sort, with an insertion sort for small sublists that aren't already sorted. So, most of its operations involve allocating a new list of about size 2N and freeing two lists of about size N. So, it can be almost as space- and allocation-efficient when copying as it would be in-place. There is up to O(log N) waste, but this usually isn't the factor that makes a copying sort slower.