Search code examples
pythonpandastensorflowtensorflow-datasets

Best way to map Text and Image while loading the data


I have a csv file which looks somewhat like in the photo.

CSV file

I'm building a model that takes both image and its corresponding text (df['Content'] as input .

I wanted to know the best way to load this data in the following way:

  • Loading the images from df['Image_location'] into a tensor.
  • And preserving the order of the image to the corresponding text.
  • Preserving the corresponding label (df['Sentiment'])

Any ideas on how this can be done?


Solution

  • You can try using the tf.data.Dataset API.

    Create dummy data:

    import numpy
    from PIL import Image
    
    for i in range(1, 3):
      imarray = numpy.random.rand(64,64,3) * 255
      im = Image.fromarray(imarray.astype('uint8')).convert('RGBA')
      im.save('result_image{}.png'.format(i))
    

    Process:

    import tensorflow as tf
    import pandas as pd
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame(data= {'Location': ['some.txt', 'some-other.txt'], 
                             'Content': ['This road was ok', 'This was wonderful'],
                             'Score': [0.0353, -0.341],
                             'Sentiment': ['Neutral', 'Positive'],
                             'Image_location': ['/content/result_image1.png', '/content/result_image2.png']})
    
    features = df[['Content', 'Image_location']]
    labels = df['Sentiment']
    
    dataset = tf.data.Dataset.from_tensor_slices((features, labels))
    def process_path(x):
      content, image_path = x[0], x[1]
      img = tf.io.read_file(image_path)
      img = tf.io.decode_png(img, channels=3)
      return content, img
    
    dataset = dataset.map(lambda x, y: (process_path(x), y))
    
    for x, y in dataset.take(1):
      content = x[0]
      image = x[1]
      print('Content -->', content)
      print('Sentiment -->', y)
      plt.imshow(image.numpy())
    
    Content --> tf.Tensor(b'This road was ok', shape=(), dtype=string)
    Sentiment --> tf.Tensor(b'Neutral', shape=(), dtype=string)
    

    enter image description here