Search code examples
pythonnumpymultidimensional-arraydeep-copy

Numpy deep copy still altering original array


As I understand it, a deep copy of an ndarray should create a second iteration of the ndarray so that changing either array will not affect the contents of the other. However, in the following code, my original ndarray is changed:

print(data[3])   #[array([[0.00000000e+00, 3.29530000e+04],
   #[4.00066376e-04, 3.29530000e+04],
   #[8.00132751e-04, 3.29530000e+04],
   #...,
   #[1.28784461e+03, 3.47140000e+04],
   #[1.28784621e+03, 3.57750000e+04],
   #[1.28785381e+03, 1.92450000e+04]]),
   #'CH4.VDC1']

new = np.empty_like(data)
new[:] = data
new[3][0][:,1] = 4/16421 * (data[3][0][:,1] - 33563)

print(data[3])  #[array([[ 0.00000000e+00, -1.48590220e-01],
   #[ 4.00066376e-04, -1.48590220e-01],
   #[ 8.00132751e-04, -1.48590220e-01],
   #...,
   #[ 1.28784461e+03,  2.80372694e-01],
   #[ 1.28784621e+03,  5.38822240e-01],
   #[ 1.28785381e+03, -3.48772913e+00]]),
   #'CH4.VDC1']

The array is a mixed type (5,2) array with a (largenumber,2) subarray inside. I am only trying to change the subarray but I am wondering if the deep copy extends to that subarray as well. I have run

np.shares_memory(new, data) #false

np.might_share_memory(new,data) #false

It might also be important to note that I am running this in a jupyter notebook. Although I can't imagine why it would change anything. You can recreate data with:

np.array([[[[0.00000000e+00, 2.82540000e+04],
[4.00066376e-04, 2.82530000e+04],
[8.00132751e-04, 2.82520000e+04],
[1.28784461e+03, 4.61170000e+04],
[1.28784621e+03, 3.38280000e+04],
[1.28785381e+03, 3.38230000e+04]],
'CH1.Bx'],
[[[0.00000000e+00, 2.00400000e+04],
[4.00066376e-04, 2.00400000e+04],
[8.00132751e-04, 2.00410000e+04],
[1.28784461e+03, 1.81600000e+04],
[1.28784621e+03, 1.80830000e+04],
[1.28785381e+03, 4.80200000e+03]],
'CH2.By'],
[array([[0.00000000e+00, 3.82520000e+04],
[4.00066376e-04, 3.82510000e+04],
[8.00132751e-04, 3.82510000e+04],
[1.28784461e+03, 3.42810000e+04],
[1.28784621e+03, 3.42820000e+04],
[1.28785381e+03, 3.40380000e+04]]),
'CH3.Bz'],
[[[ 0.00000000e+00, -1.48590220e-01],
[ 4.00066376e-04, -1.48590220e-01],
[ 8.00132751e-04, -1.48590220e-01],
[ 1.28784461e+03,  2.80372694e-01],
[ 1.28784621e+03,  5.38822240e-01],
[ 1.28785381e+03, -3.48772913e+00]],
'CH4.VDC1'],
[[[0.00000000e+00, 3.26760000e+04],
[4.00066376e-04, 3.26760000e+04],
[8.00132751e-04, 3.26750000e+04],
[1.28784981e+03, 3.40450000e+04],
[1.28785061e+03, 3.40420000e+04],
[1.28785141e+03, 3.40390000e+04]],
'CH5.VDC2']], dtype=object)`

Solution

  • That doesn't look like an array you're starting with there. It's not clear what data is, but data[3] is a 2-element list containing an array and a string, and judging by that, data is probably another list, or possibly an object-dtype array.

    Your attempt at a deep copy:

    new = np.empty_like(data)
    new[:] = data
    

    is not a deep copy. It would be a copy for most normal arrays (deep/shallow are equivalent for most arrays), but not a deep copy for a list, and not a deep copy for an object-dtype array. It will create a new object-dtype array and fill it with references to the same objects referenced by the cells of data.

    You should probably pick a better way to organize your data. This data structure is not an effective way to work with NumPy, and it will cause more problems than just this. That said, if you want to deep copy it, copy.deepcopy is probably your best bet.