Search code examples
tensorflownlptranslation

how to make dataset like this in tensorflow2: <PrefetchDataset shapes: ((), ()), types: (tf.string, tf.string)>


I want to make my own dataset when doing translation in NLP. For example, x = ["It is an apple"] y = ["It is a pear"]. How show I make a dataset which can fit "<PrefetchDataset shapes: ((), ()), types: (tf.string, tf.string)>".


Solution

  • All you need to do is to create a tf.data.Dataset with these two tensors as argument to the from_tensor_slices static method.

    import tensorflow as tf
    
    x = ["It is an apple"]
    y = ["It is a pear"]
    
    xy = tf.data.Dataset.from_tensor_slices((x, y))
    print(xy)
    >>> <TensorSliceDataset shapes: ((), ()), types: (tf.string, tf.string)>
    

    This corresponds to the Dataset signature that you are looking for. You can create a prefetch dataset with prefetch method:

    dataset = xy.prefetch(1)
    print(dataset)
    >>> <PrefetchDataset shapes: ((), ()), types: (tf.string, tf.string)>