I wrote a squared loss function for categorisation of one hot encoded data
def squared_categorical_loss(y_true, y_pred):
return K.mean(K.square(1.0 - K.sum(y_true * y_pred, axis=(1))))
which works when given numpy array examples, such as
y_true = np.asarray([[1,0,0],[0,1,0]])
y_pred = np.asarray([[0.5,0.2,0.3],[0.4,0.6,0]])
squared_categorical_loss(y_true, y_pred)
The example above returns a tensor with the value 0.205 which is the mean of (1-0.5)^2 and (1-0.6)^2, which is the desired result and what should be an optimisable loss function that generally correlates with accuracy but when I apply it to a TensorFlow model
model.compile(optimizer='adam',
loss=squared_categorical_loss,
metrics=['accuracy'])
the loss decreases to extremely small values while the training accuracy stays below 50% which shouldn't be possible as a loss below 0.125 couldn't be mathematically achieved without the accuracy being above 50% so what is wrong with my implementation? Thanks!
It will work only if y_pred is normalized (sum equals to 1).
I think that you forgot to apply softmax in the last layer of your model.