python pandas tensorflow dataset tensorflow-datasets

Fit a DataFrame into a model using TFDS data

I have a dataset in a csv right now that contains data in this format:

text	class
text1	0
text2	1

I have an RNN that would like to test my dataset on to check its performance. The RNN can be found here.

In that case, the dataset is loading a dataset from TFDS by using tfds.load().

Which way should I use to be able to fit my .csv data (I guess my DataFrame) and bring it into an acceptable form to be used by the pre-defined model?

Please feel free to comment with any clarification questions if my question is not clear. Thank you very much in advance for your support.

Solution

I would recommend using the tf.data.Dataset API. Check out this tutorial. Here is a working example based on your data structure:

import pandas as pd
import tensorflow as tf

df = pd.DataFrame(data= {'text': ['some text', 'some more text'], 'class': [0, 1]})
labels = df.pop('class')
dataset = tf.data.Dataset.from_tensor_slices((df, labels))

for x, y in dataset:
  print(x, y)

tf.Tensor([b'some text'], shape=(1,), dtype=string) tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor([b'some more text'], shape=(1,), dtype=string) tf.Tensor(1, shape=(), dtype=int64)