Search code examples
pythonpandastensorflowdatasettensorflow-datasets

Fit a DataFrame into a model using TFDS data


I have a dataset in a csv right now that contains data in this format:

text class
text1 0
text2 1

I have an RNN that would like to test my dataset on to check its performance. The RNN can be found here.

In that case, the dataset is loading a dataset from TFDS by using tfds.load().

Which way should I use to be able to fit my .csv data (I guess my DataFrame) and bring it into an acceptable form to be used by the pre-defined model?

Please feel free to comment with any clarification questions if my question is not clear. Thank you very much in advance for your support.


Solution

  • I would recommend using the tf.data.Dataset API. Check out this tutorial. Here is a working example based on your data structure:

    import pandas as pd
    import tensorflow as tf
    
    df = pd.DataFrame(data= {'text': ['some text', 'some more text'], 'class': [0, 1]})
    labels = df.pop('class')
    dataset = tf.data.Dataset.from_tensor_slices((df, labels))
    
    for x, y in dataset:
      print(x, y)
    
    tf.Tensor([b'some text'], shape=(1,), dtype=string) tf.Tensor(0, shape=(), dtype=int64)
    tf.Tensor([b'some more text'], shape=(1,), dtype=string) tf.Tensor(1, shape=(), dtype=int64)