Search code examples
tensorflowkerasdeep-learningmulticlass-classificationimage-classification

Getting Labels in a Tensorflow Image Classification


I am doing image classification by following this TensorFlow tutorial and loading my own dataset from Gdrive. Now I want to plot the confusion matrix. First, I predicted labels for the validation dataset:

val_preds = model.predict(val_ds)

but I am not sure how to get original labels to compare the prediction to them. I have tried different methods but I got very low accuracy so I know labels are not what they should be.

val_ds_labels = np.concatenate([y for x, y in val_ds], axis=0)

This gives me an accuracy of 0.067 while the below gives me an accuracy of around .70.

epochs = 10
history=model.fit(train_ds, epochs=epochs, validation_data=val_ds)

Here is how I created the validation and training dataset:

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    "images",
    validation_split=0.2,
    subset="training",
    seed=123,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='int'
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    "images",
    validation_split=0.2,
    subset="validation",
    seed=123,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='int'
)
train_ds = train_ds.prefetch(buffer_size=32)
val_ds = val_ds.prefetch(buffer_size=32)

Then created the model and compile it:

model.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.SparseTopKCategoricalAccuracy(k=1)],
)

and fit

epochs = 10
history=model.fit(train_ds, epochs=epochs, validation_data=val_ds)

I have 22 labels.

val_preds = model.predict(val_ds)

Solution

  • After training, get the true labels of the validation set as follows:

    epochs=5
    history = model.fit(
      train_ds,
      validation_data=val_ds,
      epochs=epochs
    )
    
    ....
    ....
    Epoch 4/5
    20ms/step - loss: 0.6368 - accuracy: 0.7613 - val_loss: 0.9294 - val_accuracy: 0.6185
    Epoch 5/5
    20ms/step - loss: 0.4307 - accuracy: 0.8531 - val_loss: 0.9552 - val_accuracy: 0.6635
    
    # get the labels 
    predictions = np.array([])
    labels =  np.array([])
    
    for x, y in val_ds:
      predictions = np.concatenate([predictions, np.argmax(model.predict(x), axis=-1)])
      labels = np.concatenate([labels, y.numpy()])
    
    predictions[:10]
    array([0., 4., 3., 0., 3., 4., 2., 4., 4., 0.])
    
    labels[:10]
    array([0., 4., 3., 0., 3., 4., 1., 2., 4., 0.])
    
    m = tf.keras.metrics.Accuracy()
    m(labels, predictions).numpy()
    # 0.66348773