I want to make my own dataset when doing translation in NLP. For example, x = ["It is an apple"] y = ["It is a pear"]. How show I make a dataset which can fit "<PrefetchDataset shapes: ((), ()), types: (tf.string, tf.string)>".
All you need to do is to create a tf.data.Dataset
with these two tensors as argument to the from_tensor_slices
static method.
import tensorflow as tf
x = ["It is an apple"]
y = ["It is a pear"]
xy = tf.data.Dataset.from_tensor_slices((x, y))
print(xy)
>>> <TensorSliceDataset shapes: ((), ()), types: (tf.string, tf.string)>
This corresponds to the Dataset signature that you are looking for. You can create a prefetch dataset with prefetch
method:
dataset = xy.prefetch(1)
print(dataset)
>>> <PrefetchDataset shapes: ((), ()), types: (tf.string, tf.string)>