QUESTION: What is the cause of this error and how do I fix it?
BACKGROUND: I am attempting to implement a custom ("hierarchical") loss function to classify CIFAR-100 images that leverages the class hierarchy. This dataset has 20 coarse classes, each with 5 fine classes. The custom loss function is a weighted sum of the fine class crossentropy loss and the coarse class crossentropy loss. It determines the coarse class crossentropy loss by first mapping the true fine labels (y_true
) to the true coarse labels (y_true_coarse
) and the predicted fine labels as softmax probabilities (y_pred
) to the predicted coarse labels as softmax probabilities (y_pred_coarse
). The mapping is done with a TensorFlow "dictionary". The fine class crossentropy loss works just fine by itself. The problem is the coarse class crossentropy loss.
When I implement the code in a training loop I get ValueError: No gradients provided for any variable
. Below is the code for my custom loss function.
# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION
# First, map the true fine labels to the true coarse labels
def get_y_true_coarse(y_true):
y_true = tf.constant(y_true, dtype=tf.int32)
y_true_coarse = table.lookup(y_true)
return y_true_coarse
# Next, map the predicted fine class to the predicted coarse class (softmax probabilities)
initialize = tf.zeros(shape=(batch_size, num_coarse_classes), dtype=tf.float32)
y_pred_coarse = tf.Variable(initialize, dtype=tf.float32)
def get_y_pred_coarse(y_pred):
for i in range(batch_size):
for j in range(num_coarse_classes):
idx = table.lookup(tf.range(100)) == j
total = tf.reduce_sum(y_pred[i][idx])
y_pred_coarse[i, j].assign(total)
return y_pred_coarse
# Use the true coarse label and predicted coarse label (softmax probabilities) to derive the crossentropy loss of coarse labels
def hierarchical_loss(y_true, y_pred):
y_true_coarse = get_y_true_coarse(y_true)
y_pred_coarse = get_y_pred_coarse(y_pred)
return SparseCategoricalCrossentropy()(y_true_coarse, y_pred_coarse)
# Use the true fine label and predicted finel label (softmax probabilities) to derive the crossentropy loss of fine labels
def crossentropy_loss(y_true, y_pred):
return SparseCategoricalCrossentropy()(y_true, y_pred)
# Finally, combine the coarse class and fine class crossentropy losses
def custom_loss(y_true, y_pred):
H = 0.5
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss
I am passing the argument run_eagerly=True
to the model.compile
method before executing the model.fit
method.
INVESTIGATIONS CONDUCTED: I have reviewed the tensorflow graphs and tf.function introduction and stackoverflow/stackexchange pages. It seems that differentiability of the loss function the most commonly cited cause for this error in most articles (see article1 and article2), but my loss function is merely a weighted sum of two different crossentropy loss functions and, therefore, should be differentiable. I am using Python 3.9.7, TensorFlow 2.9.1, and VS Code 1.7.1 on 64bit Windows 10 machine.
NOTE: The fine class crossentropy loss (crossentropy_loss
) works just fine by itself when I pass it to the model.fit
method. The problem is the coarse class crossentropy loss (hierarchical_loss
). Therefore, to better isolate the problem, I am passing the latter function to the model.fit
method, not the custom_loss
. I will mention that I have also tried passing a custom_loss when H=1, and this results in no "learning" (i.e., the accuracy remains at ~1% - the expected/naïve accuracy - from epoch to epoch). 1% accuracy is what one would get from randomly guessing one of the 100 (balanced) classes. If the custom_loss
was working, we would expect at least some learning to occur because learning the coarse labels perfectly would result in a ~20% accuracy given that there are 5 fine labels within each coarse category.
# %%
# THIS CODE CELL LOADS THE PACKAGES USED IN THIS NOTEBOOK
# Load core packages for data analysis and visualization
import pandas as pd
import matplotlib.pyplot as plt
# Load deep learning packages
import tensorflow as tf
from tensorflow.keras.datasets.cifar100 import load_data
from tensorflow.keras import (Model, layers)
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.utils import (to_categorical, plot_model)
from tensorflow.lookup import (StaticHashTable, KeyValueTensorInitializer)
# Print versions of main ML packages
print("Tensorflow version " + tf.__version__)
# %%
# THIS CODE CELL LOADS DATASETS AND CHECKS DATA DIMENSIONS
# There is an option to load the "fine" (100 fine classes) or "coarse" (20 super classes) labels with integer (int) encodings
# We will load both labels for hierarchical classification tasks
(x_train, y_train_fine_int), (x_test, y_test_fine_int) = load_data(label_mode="fine")
(_, y_train_coarse_int), (_, y_test_coarse_int) = load_data(label_mode="coarse")
# EXTRACT DATASET PARAMETERS FOR USE LATER ON
num_fine_classes = 100
num_coarse_classes = 20
input_shape = x_train.shape[1:]
# DEFINE BATCH SIZE
batch_size = 50
# %%
# THIS CODE CELL PROVIDES THE CODE TO LINK INTEGER LABELS TO MEANINGFUL WORD LABELS
# Fine and coarse labels are provided as integers. We will want to link them both to meaningful world labels.
# CREATE A DICTIONARY TO MAP THE 20 COARSE LABELS TO THE 100 FINE LABELS
# This mapping comes from https://keras.io/api/datasets/cifar100/
# Except "computer keyboard" should just be "keyboard" for the encoding to work
CoarseLabels_to_FineLabels = {
"aquatic mammals": ["beaver", "dolphin", "otter", "seal", "whale"],
"fish": ["aquarium fish", "flatfish", "ray", "shark", "trout"],
"flowers": ["orchids", "poppies", "roses", "sunflowers", "tulips"],
"food containers": ["bottles", "bowls", "cans", "cups", "plates"],
"fruit and vegetables": ["apples", "mushrooms", "oranges", "pears", "sweet peppers"],
"household electrical devices": ["clock", "keyboard", "lamp", "telephone", "television"],
"household furniture": ["bed", "chair", "couch", "table", "wardrobe"],
"insects": ["bee", "beetle", "butterfly", "caterpillar", "cockroach"],
"large carnivores": ["bear", "leopard", "lion", "tiger", "wolf"],
"large man-made outdoor things": ["bridge", "castle", "house", "road", "skyscraper"],
"large natural outdoor scenes": ["cloud", "forest", "mountain", "plain", "sea"],
"large omnivores and herbivores": ["camel", "cattle", "chimpanzee", "elephant", "kangaroo"],
"medium-sized mammals": ["fox", "porcupine", "possum", "raccoon", "skunk"],
"non-insect invertebrates": ["crab", "lobster", "snail", "spider", "worm"],
"people": ["baby", "boy", "girl", "man", "woman"],
"reptiles": ["crocodile", "dinosaur", "lizard", "snake", "turtle"],
"small mammals": ["hamster", "mouse", "rabbit", "shrew", "squirrel"],
"trees": ["maple", "oak", "palm", "pine", "willow"],
"vehicles 1": ["bicycle", "bus", "motorcycle", "pickup" "truck", "train"],
"vehicles 2": ["lawn-mower", "rocket", "streetcar", "tank", "tractor"]
}
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED COARSE LABEL TO THE WORD LABEL
# Create list of Course Labels
CoarseLabels = list(CoarseLabels_to_FineLabels.keys())
# The target variable in CIFER100 is encoded such that the coarse class is assigned an integer based on its alphabetical order
# The CoarseLabels list is already alphabetized, so no need to sort
CoarseInts_to_CoarseLabels = dict(enumerate(CoarseLabels))
# CREATE A DICTIONARY TO MAP THE WORD LABEL TO THE INTEGER-ENCODED COARSE LABEL
CoarseLabels_to_CoarseInts = dict(zip(CoarseLabels, range(20)))
# CREATE A DICTIONARY TO MAP THE 100 FINE LABELS TO THE 20 COARSE LABELS
FineLabels_to_CoarseLabels = {}
for CoarseLabel in CoarseLabels:
for FineLabel in CoarseLabels_to_FineLabels[CoarseLabel]:
FineLabels_to_CoarseLabels[FineLabel] = CoarseLabel
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABEL TO THE WORD LABEL
# Create a list of the Fine Labels
FineLabels = list(FineLabels_to_CoarseLabels.keys())
# The target variable in CIFER100 is encoded such that the fine class is assigned an integer based on its alphabetical order
# Sort the fine class list.
FineLabels.sort()
FineInts_to_FineLabels = dict(enumerate(FineLabels))
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABELS TO THE INTEGER-ENCODED COARSE LABELS
b = list(dict(sorted(FineLabels_to_CoarseLabels.items())).values())
FineInts_to_CoarseInts = dict(zip(range(100), [CoarseLabels_to_CoarseInts[i] for i in b]))
# CREATE A TENSORFLOW LOOKUP TABLE TO MAP THE INTEGER-ENCODED FINE LABELS TO THE INTEGER-ENCODED COARSE LABELS
table = StaticHashTable(
initializer=KeyValueTensorInitializer(
keys=list(FineInts_to_CoarseInts.keys()),
values=list(FineInts_to_CoarseInts.values()),
key_dtype=tf.int32,
value_dtype=tf.int32
),
default_value=tf.constant(-1, tf.int32),
name="dictionary"
)
# %%
# THIS CODE CELL IS TO BUILD A FUNCTIONAL MODEL
inputs = layers.Input(shape=input_shape)
x = layers.BatchNormalization()(inputs)
x = layers.Conv2D(64, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(1024, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)
x = layers.Dense(512, activation = "relu")(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)
output_fine = layers.Dense(num_fine_classes, activation="softmax", name="output_fine")(x)
model = Model(inputs=inputs, outputs=output_fine)
# %%
# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION
# First, map the true fine labels to the true coarse labels
def get_y_true_coarse(y_true):
y_true = tf.constant(y_true, dtype=tf.int32)
y_true_coarse = table.lookup(y_true)
return y_true_coarse
# Next, map the predicted fine class to the predicted coarse class (softmax probabilities)
initialize = tf.zeros(shape=(batch_size, num_coarse_classes), dtype=tf.float32)
y_pred_coarse = tf.Variable(initialize, dtype=tf.float32)
def get_y_pred_coarse(y_pred):
for i in range(batch_size):
for j in range(num_coarse_classes):
idx = table.lookup(tf.range(100)) == j
total = tf.reduce_sum(y_pred[i][idx])
y_pred_coarse[i, j].assign(total)
return y_pred_coarse
# Use the true coarse label and predicted coarse label (softmax probabilities) to derive the crossentropy loss of coarse labels
def hierarchical_loss(y_true, y_pred):
y_true_coarse = get_y_true_coarse(y_true)
y_pred_coarse = get_y_pred_coarse(y_pred)
return SparseCategoricalCrossentropy()(y_true_coarse, y_pred_coarse)
# Use the true fine label and predicted finel label (softmax probabilities) to derive the crossentropy loss of fine labels
def crossentropy_loss(y_true, y_pred):
return SparseCategoricalCrossentropy()(y_true, y_pred)
# Finally, combine the coarse class and fine class crossentropy losses
def custom_loss(y_true, y_pred):
H = 0.5
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss
# %%
# THIS CODE CELL IS TO COMPILE THE MODEL
model.compile(optimizer="adam", loss=hierarchical_loss, metrics="accuracy", run_eagerly=True)
# %%
# THIS CODE CELL IS TO TRAIN THE MODEL
history = model.fit(x_train, y_train_fine_int, epochs=20, validation_split=0.25, batch_size=batch_size)
# %%
# THIS CODE CELL IS TO VISUALIZE THE TRAINING
history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ["accuracy", "val_accuracy"]].plot()
history_frame.loc[:, ["loss", "val_loss"]].plot()
plt.show()
To do some debugging, I printed the shape and data types of the variables and ensured that the functions are working properly under eager execution (tf.config.run_functions_eagerly(True)
). What looks fishy is the data type of y_true_coarse
and y_pred_coarse
, as they have different datatypes than y_true
and y_pred
, respectively. Nonetheless, the loss functions seem to be generating the correct output on the 50 training examples.
# Get shape and datatypes of downloaded training data
print("shape of y_train_fine_int: ", y_train_fine_int.shape)
print("values of y_train_fine_int: ", y_train_fine_int.dtype)
print("type of y_train_fine_int: ", type(y_train_fine_int))
print("\n")
print("shape of x_train: ", x_train.shape)
print("values of x_train: ", x_train.dtype)
print("type of x_train: ", type(x_train))
> shape of y_train_fine_int: (50000, 1)
> values of y_train_fine_int: int32
> type of y_train_fine_int: <class 'numpy.ndarray'>
>
>
> shape of x_train: (50000, 32, 32, 3)
> values of x_train: uint8
> type of x_train: <class 'numpy.ndarray'>
y_true = y_train_fine_int[0:batch_size]
y_pred = model.predict(x_train[0:batch_size])
# Get shape and datatypes of the subset and predictions
print("shape of y_true: ", y_true.shape)
print("values of y_true: ", y_true.dtype)
print("type of y_true: ", type(y_true))
print("\n")
print("shape of y_pred: ", y_pred.shape)
print("values of y_pred: ", y_pred.dtype)
print("type of y_pred: ", type(y_pred), "\n")
> 2/2 [==============================] - 0s 45ms/step
> shape of y_true: (50, 1)
> values of y_true: int32
> type of y_true: <class 'numpy.ndarray'>
>
>
> shape of y_pred: (50, 100)
> values of y_pred: float32
> type of y_pred: <class 'numpy.ndarray'>
y_true_coarse = get_y_true_coarse(y_true)
y_pred_coarse = get_y_pred_coarse(y_pred)
# Get shape and datatypes of coarse true labels and predictions (softmax probabilities)
print("shape of y_true_coarse: ", y_true_coarse.shape)
print("values of y_true_coarse: ", y_true_coarse.dtype)
print("type of y_true_coarse: ", type(y_true_coarse))
print("\n")
print("shape of y_pred_coarse: ", y_pred_coarse.shape)
print("values of y_pred_coarse: ", y_pred_coarse.dtype)
print("type of y_pred_coarse: ", type(y_pred_coarse))
> shape of y_true_coarse: (50, 1)
> values of y_true_coarse: <dtype: 'int32'>
> type of y_true_coarse: <class 'tensorflow.python.framework.ops.EagerTensor'>
>
> shape of y_pred_coarse: (50, 20)
> values of y_pred_coarse: <dtype: 'float32'>
> type of y_pred_coarse: <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable'>
print("fine loss with function: ", crossentropy_loss(y_true, y_pred))
print("fine loss manually: ", SparseCategoricalCrossentropy()(y_true, y_pred), "\n")
print("coarse loss with function: ", hierarchical_loss(y_true, y_pred))
print("coarse loss manually: ", SparseCategoricalCrossentropy()(y_true_coarse, y_pred_coarse), "\n")
H = 0.5
print("total loss with function: ", custom_loss(y_true, y_pred))
print("total loss manually: ", (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred))
> fine loss with function: tf.Tensor(5.1430206, shape=(), dtype=float32)
> fine loss manually: tf.Tensor(5.1430206, shape=(), dtype=float32)
>
> coarse loss with function: tf.Tensor(3.1151817, shape=(), dtype=float32)
> coarse loss manually: tf.Tensor(3.1151817, shape=(), dtype=float32)
>
> total loss with function: tf.Tensor(4.1291013, shape=(), dtype=float32)
> total loss manually: tf.Tensor(4.1291013, shape=(), dtype=float32)
I replaced the table lookup operation in the post with a matrix operation (inspired by @Duc Nguyen), as the former was likely not differentiable and the source of the error. I did one-hot-encode my all labels with the to_categorical
function in tensorflow upfront (not shown), which was necessary for the revised loss function below to work. Importantly, this loss function does not require eager execution.
Matrix_Fine_to_Coarse_OneHot = np.zeros(shape=[num_fine_classes, num_coarse_classes], dtype=np.int32)
idx = list(range(num_fine_classes)), list(FineInts_to_CoarseInts.values())
Matrix_Fine_to_Coarse_OneHot[idx] = 1
Matrix_Fine_to_Coarse_OneHot = tf.constant(Matrix_Fine_to_Coarse_OneHot, dtype=tf.float32)
@tf.function
def crossentropy_loss(y_true, y_pred):
return CategoricalCrossentropy()(y_true, y_pred)
@tf.function
def hierarchical_loss(y_true, y_pred):
y_true_coarse = tf.matmul(y_true, Matrix_Fine_to_Coarse_OneHot)
y_pred_coarse = tf.matmul(y_pred, Matrix_Fine_to_Coarse_OneHot)
return CategoricalCrossentropy()(y_true_coarse, y_pred_coarse)
@tf.function
def custom_loss(y_true, y_pred):
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss