Search code examples
tensorflowtensorflow2.0tensorflow2.x

shuffling two tensors in the same order


As above. I tried those to no avail:

tf.random.shuffle( (a,b) )
tf.random.shuffle( zip(a,b) )

I used to concatenate them and do the shuffling, then unconcatenate / unpack. But now I'm in a situation where (a) is 4D rank tensor while (b) is 1D, so, no way to concatenate.

I also tried to give the seed argument to the shuffle method so it reproduces the same shuffling and I use it twice => Failed. Also tried to do the shuffling myself with randomly shuffled range of numbers, but TF is not as flexible as numpy in fancy indexing and stuff ==> failed.

What I'm doing now is, convert everything back to numpy then use shuffle from sklearn then go back to tensors by recasting. It is sheer stupid way. This is supposed to happen inside a graph.


Solution

  • You could just shuffle the indices and then use tf.gather() to extract values corresponding to those shuffled indices:

    TF2.x (UPDATE)

    import tensorflow as tf
    import numpy as np
    
    x = tf.convert_to_tensor(np.arange(5))
    y = tf.convert_to_tensor(['a', 'b', 'c', 'd', 'e'])
    
    indices = tf.range(start=0, limit=tf.shape(x)[0], dtype=tf.int32)
    shuffled_indices = tf.random.shuffle(indices)
    
    shuffled_x = tf.gather(x, shuffled_indices)
    shuffled_y = tf.gather(y, shuffled_indices)
    
    print('before')
    print('x', x.numpy())
    print('y', y.numpy())
    
    print('after')
    print('x', shuffled_x.numpy())
    print('y', shuffled_y.numpy())
    # before
    # x [0 1 2 3 4]
    # y [b'a' b'b' b'c' b'd' b'e']
    # after
    # x [4 0 1 2 3]
    # y [b'e' b'a' b'b' b'c' b'd']
    

    TF1.x

    import tensorflow as tf
    import numpy as np
    
    x = tf.placeholder(tf.float32, (None, 1, 1, 1))
    y = tf.placeholder(tf.int32, (None))
    
    indices = tf.range(start=0, limit=tf.shape(x)[0], dtype=tf.int32)
    shuffled_indices = tf.random.shuffle(indices)
    
    shuffled_x = tf.gather(x, shuffled_indices)
    shuffled_y = tf.gather(y, shuffled_indices)
    

    Make sure that you compute shuffled_x, shuffled_y in the same session run. Otherwise they might get different index orderings.

    # Testing
    x_data = np.concatenate([np.zeros((1, 1, 1, 1)),
                             np.ones((1, 1, 1, 1)),
                             2*np.ones((1, 1, 1, 1))]).astype('float32')
    y_data = np.arange(4, 7, 1)
    
    print('Before shuffling:')
    print('x:')
    print(x_data.squeeze())
    print('y:')
    print(y_data)
    
    with tf.Session() as sess:
      x_res, y_res = sess.run([shuffled_x, shuffled_y],
                              feed_dict={x: x_data, y: y_data})
      print('After shuffling:')
      print('x:')
      print(x_res.squeeze())
      print('y:')
      print(y_res)
    
    Before shuffling:
    x:
    [0. 1. 2.]
    y:
    [4 5 6]
    After shuffling:
    x:
    [1. 2. 0.]
    y:
    [5 6 4]