Search code examples
pythonnumpytensorflowtensortensorflow-datasets

Create a tensorflow dataset based on a "multi-input"


Problem

Create a tf.data.Dataset object from a numpy array that contains multiple X array.

Explaination

This is the model that I'm using, some layers eliminated for reduce the image:Model

As you can see, the model contains two different input:

  • The data itself (shape [Batch, 730, 1]) (from now called x_train)
  • The timestamp (shape [Batch, 730, 3]) (from now called ts_train)

The problem that I'm aiming to solve is a timeseries forecast.
The x_train contains a single feature.
The ts_train contains three features that rappresent Year,Month,Day of the misuration.

I can fit/evaluate/predict the model without any particular problem.
Example of fit:

model.fit(
    [x_train, ts_train],
    y_train,
    batch_size=1024,
    epochs=2000,
    validation_data=([x_test, ts_test], y_test),
    callbacks=callbacks,
)

Example of predict:

model.predict([x_test[0].reshape(1, window, 1), ts_test[0].reshape(1, window, 3)])

However, i can't understand how to cast the numpy array that rappresent my dataset into a tensorflow dataset.

Using the following code:

tf.data.Dataset.from_tensor_slices([x_train, ts_train], y_train)

I'll receive the following error:

ValueError: Can't convert non-rectangular Python sequence to Tensor.

How can I cast my 2 x -> 1 y into a tf.data.Dataset ?


Solution

  • Maybe try using tuples like this:

    import numpy as np
    import tensorflow as tf
    
    x_train = np.random.random((50, 730, 1))
    ts_train = np.random.random((50, 730, 3))
    y_train = np.random.random((50, 5))
    
    ds = tf.data.Dataset.from_tensor_slices(((x_train, ts_train), y_train))
    
    for (x, t), y in ds.take(1):
      print(x.shape, t.shape, y.shape)
    
    (730, 1) (730, 3) (5,)
    

    And here is an example model:

    input1 = tf.keras.layers.Input((730, 1))
    input2 = tf.keras.layers.Input((730, 3))
    x = tf.keras.layers.Flatten()(input1)
    y = tf.keras.layers.Flatten()(input2)
    outputs = tf.keras.layers.Concatenate()([x, y])
    outputs = tf.keras.layers.Dense(5)(outputs)
    model = tf.keras.Model([input1, input2], outputs)
    model.compile(optimizer='adam', loss='mse')
    model.fit(ds.batch(10), epochs=5)