Search code examples
pythonnumpyshuffle

How to shuffle two numpy arrays, such that the correspondence between shuffled arrays is the same as the original arrays


So I have two np arrays: namely Labels and Images

The shape of Labels is (3000,1), and the shape of Images is (3000, 226,226,3). The ith label corresponds to the ith image. Now I want to shuffle my dataset such that the correspondence is preserved. Is there a python library or function for this task?

One idea I got was to concatenate the labels and images into one array, and then shuffle the resulting matrix by axis = 0. However, np.concatenate() did not allow it because the shape of the arrays is not the same.


Solution

  • You can do it in at least 2 ways.

    You can shuffle indices and use advanced indexing (sub optimal with memory)

    indices = np.arange(len(images))
    np.random.shuffle(indices)
    images = images[indices]
    labels = labels[indices]
    

    But I think the best is resetting the seed before each shuffle to allow in place shuffle:

    # make sure images and labels are shuffled in place and in unison
    s = 42 # or whatever
    np.random.seed(s)
    np.random.shuffle(images)
    np.random.seed(s)
    np.random.shuffle(labels)