Search code examples
tensorflowtensorflow2.0tf.data.dataset

tf.data WindowDataset flat_map gives 'dict' object has no attribute 'batch' error


I am trying to do batches of type (batch_size, time_steps, my_data)

Why on flat_map step I get AttributeError: 'dict' object has no attribute 'batch'

 x_train = np.random.normal(size=(60000, 768))
    token_type_ids = np.ones(shape=(len(x_train)))
    position_ids = np.random.normal(size=(x_train.shape[0], 5))

    features_ds = tf.data.Dataset.from_tensor_slices({'inputs_embeds': x_train,
                                                      'token_type_ids': token_type_ids,
                                                      'position_ids': position_ids})
    y_ds = tf.data.Dataset.from_tensor_slices(y_train)
    ds = tf.data.Dataset.zip((features_ds, y_ds))
    # result = list(ds.as_numpy_iterator())

    result_ds = ds.window(size=time_steps, shift=time_steps, stride=1, drop_remainder=True). \
        flat_map(lambda x, y: tf.data.Dataset.zip((x.batch(time_steps), y.batch(time_steps))))

Any idea what is the issue ? and how to solve it ?


Solution

  • You can add batch as separate step:

    x_train = np.random.normal(size=(60000, 768))
    token_type_ids = np.ones(shape=(len(x_train)))
    position_ids = np.random.normal(size=(x_train.shape[0], 5))
    
    features_ds = tf.data.Dataset.from_tensor_slices({'inputs_embeds': x_train,
                                                      'token_type_ids': token_type_ids,
                                                      'position_ids': position_ids})
    y_train = np.random.normal(size=(60000, 1))
    y_ds = tf.data.Dataset.from_tensor_slices(y_train)
    ds = tf.data.Dataset.zip((features_ds, y_ds))
    
    result_ds = ds.window(size=time_steps, shift=time_steps, stride=1, drop_remainder=True).\
        flat_map(lambda x, y: tf.data.Dataset.zip((x, y)))
    
    time_steps=3
    result_ds=result_ds.batch(time_steps)
    
    for i in result_ds.take(1):
        print(i)