I have a dataset in a csv right now that contains data in this format:
text | class |
---|---|
text1 | 0 |
text2 | 1 |
I have an RNN that would like to test my dataset on to check its performance. The RNN can be found here.
In that case, the dataset is loading a dataset from TFDS
by using tfds.load()
.
Which way should I use to be able to fit my .csv
data (I guess my DataFrame) and bring it into an acceptable form to be used by the pre-defined model?
Please feel free to comment with any clarification questions if my question is not clear. Thank you very much in advance for your support.
I would recommend using the tf.data.Dataset
API. Check out this tutorial. Here is a working example based on your data structure:
import pandas as pd
import tensorflow as tf
df = pd.DataFrame(data= {'text': ['some text', 'some more text'], 'class': [0, 1]})
labels = df.pop('class')
dataset = tf.data.Dataset.from_tensor_slices((df, labels))
for x, y in dataset:
print(x, y)
tf.Tensor([b'some text'], shape=(1,), dtype=string) tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor([b'some more text'], shape=(1,), dtype=string) tf.Tensor(1, shape=(), dtype=int64)