Not being an engineer I'm having troubles following TF documentation on how to build a dataset.
I have gathered a dataset of sentences with labels that I would like to turn into a TF dataset similar to the IMDB dataset.
The list comes like this:
LIST=[('text1',0),('text2',1),('text3',1),('text4',0),...]
There are ~100 000 elements in the list, and 2 possible labels 0-1.
My task is to build a model that pairs a given sentence with a single label 0-1, just as the basic TF example for the IMDB reviews.
I would guess that I don't need anything else to build a dataset. Am I wrong?
How can I turn this list into a TF dataset?
I would appreciate any guide
Working sample code
import tensorflow as tf
LIST=[('text1',0),('text2',1),('text3',1),('text4',0)]
text = [x[0] for x in LIST]
label = [x[1] for x in LIST]
dataset = tf.data.Dataset.from_tensor_slices(text)
for element in dataset:
print(element)
Output:
tf.Tensor(b'text1', shape=(), dtype=string)
tf.Tensor(b'text2', shape=(), dtype=string)
tf.Tensor(b'text3', shape=(), dtype=string)
tf.Tensor(b'text4', shape=(), dtype=string)