python tensorflow keras tensorflow-datasets

Tensorflow dataset, how to feed training data using a custom windowing on every batch?

I have a dataset which is a type of tf.data.Dataset. What I am trying to do is feeding a custom range data, which is a set of tokens to every batch. For example, if my one of training dataset is [0,1,2,3,4,5], then I want to feed [1,2,3] for the first batch and then [3,4,5] for the second batch. Is there any way to control how to feed training data to the tensorflow model?

Solution

Let's assume your tf.data.Dataset is defined as follows:

train_dataset = tf.data.Dataset.from_tensor_slices(YOUR_DATA).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

and that you loop through your train_dataset resulting in batches of say 32. Depending on the form of input your model expects, you can split your batch:

for batch in dataset:
  train_step(batch) 


@tf.function
def train_step(batch):
  batch1, batch2 = tf.split(batch, 2, 0)

Note that your batch is split into two slices on the first axis (which is usually the size of your batch). After this, you can simply feed these slices to your model.

Another idea would be to try slice your tensor (your batch) with the slicing notation:

rank_3_tensor = tf.constant([
                   [[0, 1, 2, 3, 4],
                    [5, 6, 7, 8, 9]],
                   [[10, 11, 12, 13, 14],
                    [15, 16, 17, 18, 19]],
                   [[20, 21, 22, 23, 24],
                    [25, 26, 27, 28, 29]],])
print(rank_3_tensor[0:3,:,:])
# Tensor("strided_slice:0", shape=(3, 2, 5), dtype=int32)

import numpy as np

sample_size = 201
D = 5
tensor = tf.constant(np.array(range(sample_size * D * D)).reshape([sample_size, D, D]))
batches_of_n = 3
for i in range(0, tensor.shape[0], batches_of_n):
    print(tensor[i:i+batches_of_n,: :])

I think you get the idea.