Search code examples
pythonlistmemorymemory-efficientcontiguous

python 2D list, why is the memory not contiguous?


path = [[0, 1, 0, 0], 
       [0, 0, 1, 1], 
       [0, 0, 0, 1], 
       [1, 0, 0, 0]]
print [hex(id(path[0])),hex(id(path[1])), hex(id(path[2]))], id(path[1])-id(path[0]),id(path[2]) -id(path[1])
path[1] = [0,0, 0, 0]
print path
print [hex(id(path[0])),hex(id(path[1])), hex(id(path[2]))], id(path[1])-id(path[0]),id(path[2]) -id(path[1])

This is python 2.7, CPython. The example of the result:

['0x107d05638', '0x107cefb90', '0x107cef7e8'] -88744 -936

[[0, 1, 0, 0], [0, 0, 0, 0], [0, 0, 0, 1], [1, 0, 0, 0]]

['0x107d05638', '0x107cef8c0', '0x107cef7e8'] -89464 -216

Thank you.


Solution

  • First of all, some important notes.

    1. The id(..) does not necessary refers to the address of the object. Indeed the documentation says:

      id(object)

      Return the "identity" of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

      CPython implementation detail: This is the address of the object in memory.

      So only in CPython, it is guaranteed to be the case.

    2. There is no such thing as a 2d list in Python. Python has only a very limited amount of fundamental datastructures: sets, dictionaries, ints, strings, lists,... But no 2d lists. A 2d list is a list of lists. That may look like a detail, but you can for instance declare: [[1,0,1],2,'foobar']. So this is only partially a 2d list. Python simply sees a list and one of the elements happens to be a list.

      What does this mean? That in your question the list looks like:

      +---------------+
      |      list     |
      +---+---+---+---+
      | o | o | o | o |
      +-|-+-|-+-|-+-|-+
        |   |   |   \________________________________________________
        |   |   \________________________________                    \
        |   \______________                      \                    |
        |                  \                      |                   |
      +-v-------------+  +--v------------+  +-----v---------+  +------v--------+
      |      list     |  |      list     |  |      list     |  |      list     |
      +---+---+---+---+  +---+---+---+---+  +---+---+---+---+  +---+---+---+---+
      | o | o | o | o |  | o | o | o | o |  | o | o | o | o |  | o | o | o | o |
      +-|-+---+---+---+  +-|-+-|-+-|-+-|-+  +-|-+-|-+-|-+-|-+  +-|-+-|-+-|-+-|-+
        v   v   v   v      v   v   v   v      v   v   v   v      v   v   v   v
        0   1   0   0      0   0   1   1      0   0   0   1      1   0   0   0
      

      The 0s and 1s are also objects by the way, but these are references to the same object.

    3. Usually the interpreter uses a heap and allocates on the heap. Now a heap can be fragmented because earlier objects have been allocated and deallocated on the heap. As a result, the interpreter aims to find a hole where the object to allocate can fit into.

    4. Lists can share sublists. For instance:

      a = [1,0]
      b = [[0,0],a,[0,1],[0,0]]
      c = [a]
      

      Now both b and c have an element that references the object a. So one cannot make both b and c contiguous.

    5. If you perform:

      path[1] = [0,0, 0, 0]
      

      you basically first construct a new list [0,0,0,0]. Since the old list path[1] is still in memory, it cannot fill that hole. So a Python interpreter will have to find another place to locate that [0,0,0,0]. Then it sets a reference from path[1] to that list, and finally (usually after reference counting is updated), it will remove the old list for path[1] (this can also be delayed until memory should be freed) and thus marking the place that was once occupied back as vacant.

    These notes basically answer the question: there is no 2d list: all objects are allocated on the heap and because a lot of allocation/deallocation happens (for instance even when starting up and loading libraries). It is undetermined where an object will be allocated. Since an object can be shared the memory layout cannot be contiguous for all lists at the same time.

    Finally note that Python usually assumes that programmers convenience is more important than efficiency. Python is rather inefficient: it is dynamically typed, has (for instance by using an MRO) inefficient way to resolve dynamic bindings, usually has a lot of fallback mechanisms in place that also cost computational effort. In they also introduced integers with arbitrary length, etc. These features all place a considerable burden on the CPU, but the philosophy of Python is usually that Programmers convenience is preferred above CPU efficiency since paying an additional programmer usually costs more than buying two new servers.