Search code examples
pythonnumpymemorylarge-data

python - repeating numpy array without replicating data


This question has been asked before, but the solution only works for 1D/2D arrays, and I need a more general answer.

How do you create a repeating array without replicating the data? This strikes me as something of general use, as it would help to vectorize python operations without the memory hit.

More specifically, I have a (y,x) array, which I want to tile multiple times to create a (z,y,x) array. I can do this with numpy.tile(array, (nz,1,1)), but I run out of memory. My specific case has x=1500, y=2000, z=700.


Solution

  • One simple trick is to use np.broadcast_arrays to broadcast your (x, y) against a z-long vector in the first dimension:

    import numpy as np
    
    M = np.arange(1500*2000).reshape(1500, 2000)
    z = np.zeros(700)
    
    # broadcasting over the first dimension
    _, M_broadcast = np.broadcast_arrays(z[:, None, None], M[None, ...])
    
    print M_broadcast.shape, M_broadcast.flags.owndata
    # (700, 1500, 2000), False
    

    To generalize the stride_tricks method given for a 1D array in this answer, you just need to include the shape and stride length for each dimension of your output array:

    M_strided = np.lib.stride_tricks.as_strided(
                    M,                              # input array
                    (700, M.shape[0], M.shape[1]),  # output dimensions
                    (0, M.strides[0], M.strides[1]) # stride length in bytes
                )