Search code examples
pythonnumpyrandomvectorizationscientific-computing

Create a Numpy matrix storing shuffled versions of an input ndarray


I have a 2d ndarray called weights of shape (npts, nweights). For every column of weights, I wish to randomly shuffle the rows. I want to repeat this process num_shuffles times, and store the collection of shufflings into a 3d ndarray called weights_matrix. Importantly, for each shuffling iteration, the shuffling indices of each column of weights should be the same.

Below appears an explicit naive double-for-loop implementation of this algorithm. Is it possible to avoid the python loops and generate weights_matrix in pure Numpy?

import numpy as np 
npts, nweights = 5, 2
weights = np.random.rand(npts*nweights).reshape((npts, nweights))

num_shuffles = 3
weights_matrix = np.zeros((num_shuffles, npts, nweights))
for i in range(num_shuffles):
    indx = np.random.choice(np.arange(npts), npts, replace=False)
    for j in range(nweights):
        weights_matrix[i, :, j] = weights[indx, j]

Solution

  • You can start by filling your 3-D array with copies of the original weights, then perform a simple iteration over slices of that 3-D array, using numpy.random.shuffle to shuffle each 2-D slice in-place.

    For every column of weights, I wish to randomly shuffle the rows...the shuffling indices of each column of weights should be the same

    is just another way of saying "I want to randomly reorder the rows of a 2D array". numpy.random.shuffle is a numpy-array-capable version of random.shuffle: it will reorder the elements of a container in-place. And that's all you need, since the "elements" of a 2-D numpy array, in that sense, are its rows.

    import numpy
    weights = numpy.array( [ [ 1, 2, 3 ], [ 4, 5, 6], [ 7, 8, 9 ] ] )
    weights_3d = weights[ numpy.newaxis, :, : ].repeat( 10, axis=0 )
    
    for w in weights_3d:
        numpy.random.shuffle( w )  # in-place shuffle of the rows of each slice
    
    print( weights_3d[0, :, :] )
    print( weights_3d[1, :, :] )
    print( weights_3d[2, :, :] )