I have a keras model with two inputs of different shape. One side takes in few categorical features, while the other takes multiple time series with length PAST_HISTORY
. The output is also multiple time series:
# Categorical data input
input_ct = keras.Input(shape=(len(categ_cols),),
name='categorical_input')
# Timeseries input
input_ts = keras.Input(shape=(PAST_HISTORY, len(series_cols)),
name='timeseries_input')
...
model = keras.models.Model(inputs=[input_ct, input_ts], outputs=outputs)
I created a Dataset for each input and for the output using a pandas DataFrame and some tf.data.Dataset
operations.
df_ts = df[series_cols][:-FUTURE_TARGET]
ts_batch = lambda window: window.batch(PAST_HISTORY)
time_series_data = tf.data.Dataset.from_tensor_slices(df_ts)\
.window(PAST_HISTORY, 1, 1, True)\
.flat_map(ts_batch)
df_cat = df[categ_cols][PAST_HISTORY - 1:-FUTURE_TARGET]
date_data = tf.data.Dataset.from_tensor_slices(df_cat)
df_target = df[target_cols][PAST_HISTORY:]
target_batch = lambda window: window.batch(FUTURE_TARGET)
target_data = tf.data.Dataset.from_tensor_slices(df_target)\
.window(FUTURE_TARGET, 1, 1, True)\
.flat_map(target_batch)
To create the final Dataset I used a generator:
def generator():
for d1, d2, t in zip(date_data, time_series_data, target_data):
yield {"categorical_input": d1, "timeseries_input": d2}, tf.transpose(t)
dataset = tf.data.Dataset.from_generator(generator,
output_types=(
{'categorical_input': tf.int64, 'timeseries_input': tf.float64},
tf.float64),
output_shapes=(
{'categorical_input': (len(categ_cols),),'timeseries_input': (PAST_HISTORY, len(series_cols))},
(len(target_cols), FUTURE_TARGET),))
This worked and I managed to train a model on eager execution by calling model.fit
. However now that I'm trying to create an Estimator
from this model the creation of the Dataset no longer works as it implicitly uses the __iterator__
function which is disallowed on lazy evaluation. Specifically the problem lies in the zip
operation on the generator.
I tried to create the same dataset without the generator with the following code:
dataset = tf.data.Dataset.from_tensors(
({'categorical_input': date_data, 'timeseries_input': time_series_data}, target_data)
)
This gets me following error when I try to call estimator.train
:
TypeError: Failed to convert object of type <class 'tensorflow.python.data.ops.dataset_ops._NestedVariant'> to Tensor.
Contents: <tensorflow.python.data.ops.dataset_ops._NestedVariant object at 0x7f5bf84a97f0>.
Consider casting elements to a supported type.
What is the way to solve this error? Or is there another way to construct this Dataset without having to call an iterator on a Dataset?
Also, I tried to cast the Datasets and got the following error on the windowed Datasets:
TypeError: Failed to convert object of type <class 'tensorflow.python.data.ops.dataset_ops.FlatMapDataset'> to Tensor.
Contents: <FlatMapDataset shapes: (None, 2), types: tf.float64>.
Consider casting elements to a supported type.
Dummy data:
df = pd.DataFrame(data={
'ts_1': np.random.rand(10000),
'ts_2': np.random.rand(10000),
'ts_objective': np.random.rand(10000),
'cat_1': np.random.randint(1, 10 + 1, 10000),
'cat_2': np.random.randint(1, 25 + 1, 10000),
'cat_3': np.random.randint(1, 30 + 1, 10000),
'cat_4': np.random.randint(1, 50 + 1, 10000)})
categ_cols = ['cat_1', 'cat_2', 'cat_3', 'cat_4']
series_cols = ['ts_1', 'ts_2']
target_cols = ['ts_objective']
PAST_HISTORY = 24
FUTURE_TARGET = 8
You can build the dataset you need without using a generator (and much faster) using Dataset
operations only:
import tensorflow as tf
date_data = ...
time_series_data = ...
target_data = ...
def data_tx(d1, d2, t):
return {"categorical_input": d1, "timeseries_input": d2}, tf.transpose(t)
dataset = tf.data.Dataset.zip((date_data, time_series_data, target_data)).map(data_tx)