I'm a bit confused by the behavior of the below code and wondering if someone could shed some light on this. Basically, I have a matrix called mat
which is a numpy ndarray
. I get its diagonal using mat.diagonal()
and assign it to the variable diag
. I changed all diagonal values of mat
to 100. Now I find diag
has its values all changed to 100 too, which seems to indicate that diag
directly references elements in mat
. Yet, when I check the memory address of the first element in dia
g and compare it to that of mat
, they are different. What's the right way to look at this?
import numpy as np
import pandas as pd
mat_df = pd.DataFrame(data=[[1,2,3], [4,5,6], [7,8,9]])
print(mat_df)
mat = mat_df.values
diag = mat.diagonal()
print(diag)
diag_loc = np.diag_indices_from(mat)
mat[diag_loc] = 100
print(diag)
print(diag[0])
print(id(diag[0]))
print(mat[0][0])
print(id(mat[0][0]))
mat
:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
diag
:
[1 5 9]
diag
's values change due to mat
's change:
[100 100 100]
the first value of diag
:
100
and its address
139863357577488
the first value of mat:
100
and its address
139863376059664
You can't know the address with id
. First of all, id
doesn't return an address (even tho, CPython implementation use memory address to build the id
, that is just one implementation, and that is not the address per se). And secondly, that would only be the address of the python object (in your case the one wrapping the numpy.int64
).
That python object is just build to wrap whatever numpy functions (that are opaque to python: python doesn't know when they are supposed to return the same values) return.
Simple experiment you can do to convince yourself how your id
means nothing
id(diag[0])
# 139729998312368
id(diag[0])
# 139730045496016
See, not even two consecutive exactly identical call does not return the same id!
diag[0]
is a call to numpy's diag.__getitem__(0)
. Wrapped into a python container that is different each time, as would be the result of a call to any function f(0)
, for which there is no reason to suppose that each identical call return the exact same result.
So, if you want to know where the actual int64
are stored, you cannot ask python (with its id
function), since not only that is now what id
is for, but more importantly, python doesn't know. Where the int64
are stored is an internal problem of numpy's library. So you need to ask numpy.
The best way to do that is using base
imho.
diag.base
#array([[100, 2, 3],
# [ 4, 100, 6],
# [ 7, 8, 100]])
diag.base is mat.base
# True
But if you, insist on having an address of some sort, you can also
diag.ctypes.data
# 61579664
mat.ctypes.data
# 61579664
Or, for a more complete information on what data are viewed and how by the array
mat.__array_interface__
# {'data': (61579664, False), 'strides': (8, 24), 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3, 3), 'version': 3}
diag.__array_interface__
# {'data': (61579664, True), 'strides': (32,), 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3,), 'version': 3}
showing how the 2 are using the same 'data' but using different 'strides' and 'shape' to use it.