Search code examples
pythonnumpyownershipshallow-copy

Numpy ndarray data ownership problem on reshape and view


I am confused about the ownership mechanism in numpy.

import numpy as np
a = np.arange(10)
a.flags.owndata     # True
id(a)               # 140289740187168

The first four lines is obvious, variable a owns data of id 140289740187168.

b = a
c = a.view()
d = a.reshape((2, 5))
print(b.flags.owndata, b.base, id(b.base)) # True None 94817978163056
print(c.flags.owndata, c.base, id(c.base)) # False [0 1 2 3 4 5 6 7 8 9] 140289740187168
print(d.flags.owndata, d.base, id(d.base)) # False [0 1 2 3 4 5 6 7 8 9] 140289740187168
id(None)                                   # 94817978163056

Variable c,d are all "shallow" copy from a, so none of both own data. b is a and owns data (shared with a).

a = a.view()
print(id(a))                               # 140289747003632
print(a.flags.owndata, a.base, id(a.base)) # False [0 1 2 3 4 5 6 7 8 9] 140289740187168

However, assigning view of a to a creates a new variable of id 140289747003632 and leaves the data ownership to the previous old a of id 140289740187168.

The question is, since old a has been overloaded by the new a, it would be more reasonable to transfer the data ownership to the new a. Why the old a still keeps the data ownership?


Solution

  • b = a
    

    b is a, just a different name for the same object. That's not even a copy.

    These are views. A view is a new array, but it uses the same data buffer (as shown by the base:

    c = a.view()
    d = a.reshape((2, 5))
    

    I like to use __array_interface__ to look at the basic attributes of an array:

    In [210]: a = np.arange(10)
    In [211]: a.__array_interface__
    Out[211]: 
    {'data': (43515408, False),
     'strides': None,
     'descr': [('', '<i8')],
     'typestr': '<i8',
     'shape': (10,),
     'version': 3}
    

    The data[0] is some sort of representation of where the values or data of a are stored.

    A view will have the same 'data' (with a possible offset). Otherwise the view has its own strides and shape. It is a new array object with shared base:

    In [212]: d = a.reshape((2,5))
    In [213]: d.__array_interface__
    Out[213]: 
    {'data': (43515408, False),
     'strides': None,
     'descr': [('', '<i8')],
     'typestr': '<i8',
     'shape': (2, 5),
     'version': 3}
    

    Assigning the view to a does not change the original array or data buffer. The original a array object still exists in memory, along with its data buffer.

    In [214]: a = a.view()
    In [216]: a.__array_interface__['data']
    Out[216]: (43515408, False)
    

    If numpy 'updated' the a.base as you suggest, it would have to also update it for all views of the original a such as d.

    In [218]: id(a)
    Out[218]: 139767778126704
    In [219]: id(a.base)
    Out[219]: 139768132465328
    In [220]: id(d.base)
    Out[220]: 139768132465328
    

    While python and numpy maintain some sort of reference count to determine what objects are garbage, numpy does not maintain a record of what views have been made. That is, while d.base links d to a, there's isn't a link the other way.