Search code examples
pythontensorflowmachine-learningartificial-intelligence

loading data into X_train and Y_train


If this is the organisation of my data how would I load this data in to x_train and y_train to make a keras model

train.zip

The image files for the training set

train.txt

The labels for the training set

test.zip

The image files for the test set

enter image description here

This is how train.txt looks like

These are how the zip files look:

enter image description here

I am fumbled of how I should load this data so I can have numpy arrays for x_train, y_train and x_test and y_test so I can make a CNN model. I've tried many things but no luck


Solution

  • You can use a Image Data Generator. You will want to unzip your files and add a column header to the txt file. So for example like this:

    Filename Label
    train/0.jpg 5
    train/1.jpg 21
    

    Now you can use pandas to read the txt file then use the ImageDataGenerator:

    df = pandas.read_csv("uos-com2028/train/train.txt", delim_whitespace=True)
    columns = [
         "Label",
    ]
    # you may want to rescale your image if it goes from 0 to 255
    datagen = ImageDataGenerator(
         rescale=1./255.,
    )
    # you will want to change color_mode, batch_size, and target_size depending on your image
    traindata = datagen.flow_from_dataframe(
       dataframe=df,
       directory="uos-com2028/train",
       x_col="Filename",
       y_col=columns,
       color_mode='rgb',
       batch_size=16,
       class_mode="raw",
       target_size=(256, 256),
       shuffle=True,
    )
    

    You can then use the traindata object as your training input when running model.fit()