Search code examples
pythondataframetensorflowtensorflow-datasets

Can't load dataframe columns by tf.data.Dataset.from_tensor_slices()


I have a dataframe which consist of columns = id, Text, Media_location (which is relative path to images folder).

Now, I'm trying to load the columns Text, Media_location like this:

features = df[['Text', 'Media_location']]
dataset = tf.data.Dataset.from_tensor_slices((features))

and then this error comes up:

Exception has occurred: ValueError
Failed to convert a NumPy array to a Tensor (Unsupported object type float).

During handling of the above exception, another exception occurred:

  File "D:\Final\MultiCNN_test.py", line 114, in process_text_image
    dataset = tf.data.Dataset.from_tensor_slices((features))

I think this error is coming as the dataframe columns are not being able to convert to a tensor but I'm not sure how to do so, so as to remove the error.


Solution

  • If the columns Text and Media_location have the same data type your code will work:

    import tensorflow as tf
    import pandas as pd
    
    df = pd.DataFrame(data={'Text': ['some text', 'some more text'],
                            'Media_location': ['/path/to/file1', '/path/to/file2']})
    
    features = df[['Text', 'Media_location']]
    dataset = tf.data.Dataset.from_tensor_slices((features))
    
    for x in dataset:
      print(x)
    
    tf.Tensor([b'some text' b'/path/to/file1'], shape=(2,), dtype=string)
    tf.Tensor([b'some more text' b'/path/to/file2'], shape=(2,), dtype=string)
    

    However, if both have different data types, you will get your error or a similar one, since a tensor cannot have mixed data types. So try something like this:

    df = pd.DataFrame(data={'Text': [0.29, 0.58],
                            'Media_location': ['/path/to/file1', '/path/to/file2']})
    
    dataset = tf.data.Dataset.from_tensor_slices((df['Text'], df['Media_location']))
    
    for x in dataset:
      print(x)
    
    (<tf.Tensor: shape=(), dtype=float64, numpy=0.29>, <tf.Tensor: shape=(), dtype=string, numpy=b'/path/to/file1'>)
    (<tf.Tensor: shape=(), dtype=float64, numpy=0.58>, <tf.Tensor: shape=(), dtype=string, numpy=b'/path/to/file2'>)