Search code examples
pythontensorflowkerastensorflow2.0tensorflow-datasets

Merge two tensorflow datasets into one dataset with inputs and labels


I have two tensorflow datasets that are generated using timeseries_dataset_from_array (docs). One corresponds to the input of my network and the other one to the output. I guess we can call them the inputs dataset and the targets dataset, which are both the same shape (a timeseries window of a fixed size).

The code I'm using to generate these datasets goes like this:

train_x = timeseries_dataset_from_array(
    df_train['x'],
    None,
    sequence_length,
    sequence_stride=sequence_stride,
    batch_size=batch_size
)
train_y = timeseries_dataset_from_array(
    df_train['y'],
    None,
    sequence_length,
    sequence_stride=sequence_stride,
    batch_size=batch_size
)

The problem is that when calling model.fit, tf.keras expects that if a tf.data.Dataset is given in the x argument, it has to provide both the inputs and targets. That is why I need to combine these two datasets into one, setting one as inputs and the other one as targets.


Solution

  • Simplest way would be to use tf.data.Dataset.zip:

    import tensorflow as tf
    import numpy as np
    
    X = np.arange(100)
    Y = X*2
    
    sample_length = 20
    input_dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
      X, None, sequence_length=sample_length, sequence_stride=sample_length)
    target_dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
      Y, None, sequence_length=sample_length, sequence_stride=sample_length)
    
    dataset = tf.data.Dataset.zip((input_dataset, target_dataset))
    
    for x, y in dataset:
      print(x.shape, y.shape)
    
    (5, 20) (5, 20)
    

    You can then feed dataset directly to your model.