Search code examples
pythonnumpyionumpy-ndarray

Saving numpy array after indexing is much slower


I am running into an issue that saving a numpy array after indexing results in much slower saving. A minimal reproducible example can be seen below:

import time
import numpy as np

def mre(save_path):
    array = np.zeros((245, 233, 6))

    start = time.time()
    for i in range(1000):
        with open(save_path + '/array1_' + str(i), "wb") as file:
            np.save(file, array)
    end = time.time()
    print(f"No indexing: {end - start}s")

    array2 = array[:,:,[0,1,2,3,4,5]]
    start = time.time()
    for i in range(1000):
        with open(save_path + '/array2_' + str(i), "wb") as file:
            np.save(file, array2)
    end = time.time()
    print(f"With indexing: {end - start}s")
    print("Arrays are equal: ", np.array_equal(array, array2))

Which results in:

No indexing: 2.9975574016571045s
With indexing: 10.408239126205444s
Arrays are equal:  True

So according to numpy the arrays are equal, but still the resulting saving times are significantly slower. Does anyone have an idea as to why this is?


Solution

  • Have you tried to use the numpy.ascontiguousarray() function ? This function is useful when working with arrays that have a non-contiguous memory layout, as it can improve performance by ensuring that the data is stored in contiguous memory locations.

    Example

    array2 = np.ascontiguousarray(array[:,:,[0,1,2,3,4,5]]) 
    

    Output

    No indexing: 6.80817985534668s
    With indexing (contiguous copy): 6.550203800201416s