I have a dataset which is a type of tf.data.Dataset
. What I am trying to do is feeding a custom range data, which is a set of tokens to every batch.
For example, if my one of training dataset is [0,1,2,3,4,5], then I want to feed [1,2,3] for the first batch and then [3,4,5] for the second batch.
Is there any way to control how to feed training data to the tensorflow model?
Let's assume your tf.data.Dataset
is defined as follows:
train_dataset = tf.data.Dataset.from_tensor_slices(YOUR_DATA).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
and that you loop through your train_dataset
resulting in batches of say 32. Depending on the form of input your model expects, you can split your batch:
for batch in dataset:
train_step(batch)
@tf.function
def train_step(batch):
batch1, batch2 = tf.split(batch, 2, 0)
Note that your batch is split into two slices on the first axis (which is usually the size of your batch). After this, you can simply feed these slices to your model.
Another idea would be to try slice your tensor (your batch) with the slicing notation:
rank_3_tensor = tf.constant([
[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]],])
print(rank_3_tensor[0:3,:,:])
# Tensor("strided_slice:0", shape=(3, 2, 5), dtype=int32)
or
import numpy as np
sample_size = 201
D = 5
tensor = tf.constant(np.array(range(sample_size * D * D)).reshape([sample_size, D, D]))
batches_of_n = 3
for i in range(0, tensor.shape[0], batches_of_n):
print(tensor[i:i+batches_of_n,: :])
I think you get the idea.