I am running into an issue that saving a numpy array after indexing results in much slower saving. A minimal reproducible example can be seen below:
import time
import numpy as np
def mre(save_path):
array = np.zeros((245, 233, 6))
start = time.time()
for i in range(1000):
with open(save_path + '/array1_' + str(i), "wb") as file:
np.save(file, array)
end = time.time()
print(f"No indexing: {end - start}s")
array2 = array[:,:,[0,1,2,3,4,5]]
start = time.time()
for i in range(1000):
with open(save_path + '/array2_' + str(i), "wb") as file:
np.save(file, array2)
end = time.time()
print(f"With indexing: {end - start}s")
print("Arrays are equal: ", np.array_equal(array, array2))
Which results in:
No indexing: 2.9975574016571045s
With indexing: 10.408239126205444s
Arrays are equal: True
So according to numpy the arrays are equal, but still the resulting saving times are significantly slower. Does anyone have an idea as to why this is?
Have you tried to use the numpy.ascontiguousarray() function ? This function is useful when working with arrays that have a non-contiguous memory layout, as it can improve performance by ensuring that the data is stored in contiguous memory locations.
Example
array2 = np.ascontiguousarray(array[:,:,[0,1,2,3,4,5]])
Output
No indexing: 6.80817985534668s
With indexing (contiguous copy): 6.550203800201416s