Search code examples
imagecsvtensorflow-datasets

Tensorflow 2 - Associating csv lines with image files


New to Tensorflow here, so sorry if the question may be basic.

I am trying to create a GAN that will generate images based on a small set of parameters plus a random vector.

In the training set, for each image, I have also one line in a CSV file that is related to such image.

The structure of the CSV file is like this:

Parameter1, Parameter2, Parameter3, ImageFile

4, 7, 2, Image221.png

6, 0, 8, Image044.png

1, 4, 2, Image179.png

I also have a folder with the image files with the given file names.

My problem: I would like to create a pipeline that does not have to load the entire data into memory at once for training (which is a behavior tf.data.Dataset does exhibit), but I need to combine each line in the CSV file with its corresponding image file.

I know how to use list_files to use the images and I know how to use make_csv_dataset in order to use the CSV, but how do I guarantee that each CSV line will be necessarily linked to its correct image file?


Solution

  • For those facing the same problem, I found the obvious solution: all you have to do is to create a map function that takes the file name, loads it and inserts the loaded image as a tensor in a column that replaces the column of the file name.

    Ex (for one column with the file name and one with a class):

    import PIL
    
    def load_image(filename, class):
      img = PIL.Image.Open(filename)
      
      return img, class
    
    dataset = dataset.map(load_image)
    

    Notice that I am using the pillow library (PIL) in order to load the image and this is not mandatory. You can use whatever means you see fit for that.

    What really matters here is to load the image in a function and map your dataset with that function.