Search code examples
pythonarraysnumpyshuffle

Best way to permute contents of each column in numpy


What's the best way to efficiently permute the contents of each column in a numpy array?

What I have is something like:

>>> arr = np.arange(16).reshape((4, 4))
>>> arr
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

>> # Shuffle each column independently to obtain something like
array([[  8,  5, 10,  7],
       [ 12,  1,  6,  3],
       [  4,  9, 14, 11],
       [  0, 13,  2, 15]])

Solution

  • If your array is multi-dimensional, np.random.permutation permutes along the first axis (columns) by default:

    >>> np.random.permutation(arr)
    array([[ 4,  5,  6,  7],
           [ 8,  9, 10, 11],
           [ 0,  1,  2,  3],
           [12, 13, 14, 15]])
    

    However, this shuffles the row indices and so each column has the same (random) ordering.

    The simplest way of shuffling each column independently could be to loop over the columns and use np.random.shuffle to shuffle each one in place:

    for i in range(arr.shape[1]):
        np.random.shuffle(arr[:,i])
    

    Which gives, for instance:

    array([[12,  1, 14, 11],
           [ 4,  9, 10,  7],
           [ 8,  5,  6, 15],
           [ 0, 13,  2,  3]])
    

    This method can be useful if you have a very large array which you don't want to copy because the permutation of each column is done in place. On the other hand, even simple Python loops can be very slow and there are quicker NumPy methods such as the one provided by @jme.