I am confused about the ownership mechanism in numpy.
import numpy as np
a = np.arange(10)
a.flags.owndata # True
id(a) # 140289740187168
The first four lines is obvious, variable a
owns data of id 140289740187168
.
b = a
c = a.view()
d = a.reshape((2, 5))
print(b.flags.owndata, b.base, id(b.base)) # True None 94817978163056
print(c.flags.owndata, c.base, id(c.base)) # False [0 1 2 3 4 5 6 7 8 9] 140289740187168
print(d.flags.owndata, d.base, id(d.base)) # False [0 1 2 3 4 5 6 7 8 9] 140289740187168
id(None) # 94817978163056
Variable c,d
are all "shallow" copy from a
, so none of both own data. b
is a
and owns data (shared with a
).
a = a.view()
print(id(a)) # 140289747003632
print(a.flags.owndata, a.base, id(a.base)) # False [0 1 2 3 4 5 6 7 8 9] 140289740187168
However, assigning view of a
to a
creates a new variable of id 140289747003632
and leaves the data ownership to the previous old a
of id 140289740187168
.
The question is, since old a
has been overloaded by the new a
, it would be more reasonable to transfer the data ownership to the new a
. Why the old a
still keeps the data ownership?
b = a
b
is a
, just a different name for the same object. That's not even a copy.
These are views
. A view is a new array, but it uses the same data buffer (as shown by the base
:
c = a.view()
d = a.reshape((2, 5))
I like to use __array_interface__
to look at the basic attributes of an array:
In [210]: a = np.arange(10)
In [211]: a.__array_interface__
Out[211]:
{'data': (43515408, False),
'strides': None,
'descr': [('', '<i8')],
'typestr': '<i8',
'shape': (10,),
'version': 3}
The data[0]
is some sort of representation of where the values or data of a
are stored.
A view
will have the same 'data' (with a possible offset). Otherwise the view
has its own strides
and shape
. It is a new array object with shared base
:
In [212]: d = a.reshape((2,5))
In [213]: d.__array_interface__
Out[213]:
{'data': (43515408, False),
'strides': None,
'descr': [('', '<i8')],
'typestr': '<i8',
'shape': (2, 5),
'version': 3}
Assigning the view
to a
does not change the original array or data buffer. The original a
array object still exists in memory, along with its data buffer.
In [214]: a = a.view()
In [216]: a.__array_interface__['data']
Out[216]: (43515408, False)
If numpy
'updated' the a.base
as you suggest, it would have to also update it for all views of the original a
such as d
.
In [218]: id(a)
Out[218]: 139767778126704
In [219]: id(a.base)
Out[219]: 139768132465328
In [220]: id(d.base)
Out[220]: 139768132465328
While python and numpy
maintain some sort of reference count to determine what objects are garbage, numpy
does not maintain a record of what views
have been made. That is, while d.base
links d
to a
, there's isn't a link the other way.