how to make dataset like this in tensorflow2: <PrefetchDataset shapes: ((), ()), types: (tf.string, tf.string)>

I want to make my own dataset when doing translation in NLP. For example, x = ["It is an apple"] y = ["It is a pear"]. How show I make a dataset which can fit "<PrefetchDataset shapes: ((), ()), types: (tf.string, tf.string)>".

Solution

All you need to do is to create a tf.data.Dataset with these two tensors as argument to the from_tensor_slices static method.

import tensorflow as tf

x = ["It is an apple"]
y = ["It is a pear"]

xy = tf.data.Dataset.from_tensor_slices((x, y))
print(xy)
>>> <TensorSliceDataset shapes: ((), ()), types: (tf.string, tf.string)>

This corresponds to the Dataset signature that you are looking for. You can create a prefetch dataset with prefetch method:

dataset = xy.prefetch(1)
print(dataset)
>>> <PrefetchDataset shapes: ((), ()), types: (tf.string, tf.string)>