Iterating on Tensorfow Dataset returns always a differently sorted array

Suppose you have a tensorflow dataset that has values and labels. In my case I created it from a time series as:

f = pd.read_csv('MY.csv', index_col=0, parse_dates=True)
#extract the column we are interested in
single_col = df[['Close']]

#Convert to TFDataset
WINDOW_SIZE = 10
dataset = tf.data.Dataset.from_tensor_slices((single_col_df.values))
d = dataset.window(WINDOW_SIZE, shift=1, drop_remainder=True)
d2 = d.flat_map(lambda window: window.batch(WINDOW_SIZE+1))
#create data and ground truth
d3 = d2.map(lambda window: (window[:-1], window[-1:]))

#get the total data and shuffle
len_ds = 0
for item in d2:
  len_ds +=1
d_shuffled = d3.shuffle(buffer_size=len_ds)

# split train/test
train_size = int(0.7 * len_ds)
val_size = int(0.15 * len_ds)    
test_size = int(0.15 * len_ds)
train_dataset = d_shuffled.take(train_size)
test_dataset = d_shuffled.skip(train_size)
val_dataset = test_dataset.skip(test_size)
test_dataset = test_dataset.take(test_size)
train_dataset = train_dataset.batch(32).prefetch(2)
val_dataset = val_dataset.batch(32)

Now for evaluation purposes I want to get the ground truth values of the test, so I am running

y = np.concatenate([y for x, y in test_dataset], axis=0)

but this is returning each time an array differently sorted, so it cannot be compared with the models predicted by the model. For example when running the above line in jupyter notebook and printing the first 5 values of y as `y[:5], one time I get

array([[26.04000092],
       [16.39999962],
       [18.98999977],
       [42.31000137],
       [19.82999992]])

another I get

array([[15.86999989],
       [43.27999878],
       [19.32999992],
       [48.38000107],
       [17.12000084]])

but the length of y remains the same so I am assuming that the elements are just shuffled around. Anyway with this I cannot compared these values with the predicted ones, since their order is different :

y_hat = model.predict(test_dataset)

Furthermore, I get also different evaluation results. For example,

x = []
y = []
for _x,_y in test_dataset:
    x.append(_x)
    y.append(_y)
x = np.array(x)
y = np.array(y)
model.evaluate(x=x, y=y)

each time the loop defining the arrays x and y is re-executed, I get different x and y arrays that result in a different evaluation result.

Solution

Your problem :

by calling shuffle on the whole dataset before splitting it, you actually reshuffle the dataset after each exhaustion of the dataset. Here is what is happening:

The first call of y = np.concatenate([y for x, y in test_dataset], axis=0) will exhaust the test dataset
The second call of y = np.concatenate([y for x, y in test_dataset], axis=0) will see that test_dataset is exhausted, and will trigger:
1. A reshuffle of the whole dataset
2. The call to skip to get a dataset of the right size

You end up with potentially samples of your train dataset of the first exhaustion in the test dataset of the second round.

The solution

If we look at the documentation of tf.data.Dataset.suffle :

reshuffle_each_iteration (Optional.) A boolean, which if true indicates that the dataset should be pseudorandomly reshuffled each time it is iterated over. (Defaults to True.)

Set it to false to have a deterministic shuffle. If you still want to shuffle your training set each epoch, you need to call shuffle on the train set.

A dummy example :

import tensorflow as tf
tf.random.set_seed(0) # reproducibility
a = tf.range(10)
ds = tf.data.Dataset.from_tensor_slices(a)
ds_shuffled = ds.shuffle(10,reshuffle_each_iteration=False)
ds_train = ds_shuffled.take(7)
ds_train = ds_train.shuffle(7)
ds_test = ds_shuffled.skip(7)

Running it :

>>> [x.numpy() for x in ds_test]
[5, 8, 4]
>>> [x.numpy() for x in ds_test]
[5, 8, 4]
>>> [x.numpy() for x in ds_train]
[1, 3, 7, 2, 6, 9, 0]
>>> [x.numpy() for x in ds_train]
[3, 9, 6, 7, 2, 1, 0]

Try running it with reshuffle_each_iteration=True to see what happened in your own code