Search code examples
pythontensorflowtensorflow-datasets

How do I build a TensorFlow dataset from a list


Not being an engineer I'm having troubles following TF documentation on how to build a dataset.

I have gathered a dataset of sentences with labels that I would like to turn into a TF dataset similar to the IMDB dataset.

The list comes like this:

LIST=[('text1',0),('text2',1),('text3',1),('text4',0),...]

There are ~100 000 elements in the list, and 2 possible labels 0-1.

My task is to build a model that pairs a given sentence with a single label 0-1, just as the basic TF example for the IMDB reviews.

I would guess that I don't need anything else to build a dataset. Am I wrong?

How can I turn this list into a TF dataset?

I would appreciate any guide


Solution

  • Working sample code

    import tensorflow as tf
    
    LIST=[('text1',0),('text2',1),('text3',1),('text4',0)]
    text = [x[0] for x in LIST]
    label = [x[1] for x in LIST]
    dataset = tf.data.Dataset.from_tensor_slices(text)
    for element in dataset:
      print(element)
    

    Output:

    tf.Tensor(b'text1', shape=(), dtype=string)
    tf.Tensor(b'text2', shape=(), dtype=string)
    tf.Tensor(b'text3', shape=(), dtype=string)
    tf.Tensor(b'text4', shape=(), dtype=string)