Search code examples
python-3.xnumpycythontyped-memory-views

Address of memoryviews in cython are the same but point to different object


The problem

Whe defining different objects in cython, the memoryviews will return the same address. However, the array itself will get modified when indexed into.

Background.

I have base class and derived class written in cython. I noticed that when I applied multiprocessing to the classes, the underlying buffers were altered in different processess, which was not intended. During the pickling procedure I wrote a simple __reduce__ method and __deepcopy__ method that rebuilds the original object. For sake of clarity I reduced the complexity to the code below. Now my question is, why do the memoryviews return the same address? Additionally, why are the numpy array itself altered correctly even though the memoryview is the same

#distutils: language=c++
import numpy as np
cimport numpy as np
cdef class Temp:
    cdef double[::1] inp
    def __init__(self, inp):
        print(f'id of inp = {id(inp)}')
        self.inp = inp

cdef np.ndarray x = np.ones(10)
cdef Temp a       = Temp(x)
cdef Temp b       = Temp(x)
cdef Temp c       = Temp(x.copy())
b.inp[0] = -1
c.inp[2] = 10
print(f'id of a.inp = {id(a.inp)}\nid of b.inp = {id(b.inp))}\nid of c.inp = {id(c.inp)}')
print(f'id of a.inp.base = {id(a.inp.base)}\nid of b.inp.base = {id(b.inp.base))}\nid of c.inp.base = {id(c.inp.base)}')

print('a.inp.base',a.inp.base)
print('b.inp.base',b.inp.base) # expected to be the same as a
print('c.inp.base',c.inp.base) # expected to be different to a/b

Output:

id of inp = 139662709551872
id of inp = 139662709551872
id of inp = 139662709551952
id of a.inp = 139662450248672
id of b.inp = 139662450248672
id of c.inp = 139662450248672
id of a.inp.base = 139662709551872
id of b.inp.base = 139662709551872
id of c.inp.base = 139662709551952
a.inp.base [-1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
b.inp.base [-1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
c.inp.base [ 1.  1. 10.  1.  1.  1.  1.  1.  1.  1.]

Solution

  • What we call typed memory view isn't a single class: Depending on the context (Cython code, pure Python code) it changes its identity under the hood.

    So let's start with

    %%cython 
    cdef class Temp:
        cdef double[::1] inp
    

    Here double[::1] inp is of type __Pyx_memviewslice which isn't a Python object:

    typedef struct {
      struct {{memview_struct_name}} *memview;
      char *data;
      Py_ssize_t shape[{{max_dims}}];
      Py_ssize_t strides[{{max_dims}}];
      Py_ssize_t suboffsets[{{max_dims}}];
    } {{memviewslice_name}};
    

    What happens when we call id(self.inp)? Obviously, id is a pure-Python function, so a new temporary python-object (a memoryview) must be created from self.inp (only to be able to call id) and destroyed directly afterwards. The creation of the temporary Python-object is done via __pyx_memoryview_fromslice.

    Knowing that, it is easy to explain, why the ids are equal: despite being different objects, temporary memoryviews have coincidentally the same address (and thus the same id, which is an implementation detail of CPython), because the memory is reused over and over again by CPython.

    There are similar scenarios all over in Python, here is an example for method-objects, or even a more simple one:

    class A:
        pass
    # the life times of temporary objects don't overlap, so the ids can be the equal
    id(A())==id(A())
    # output: True
    
    # the life times of objects overlap, so the id cannot be equal 
    a,b=A(), A()
    id(a)==id(b)
    # output: False
    

    So in a nutshell: your expectation, that the same id means the same object is wrong. This assumption only holds, when the life times of objects overlap.