I have a problem with imbalanced labels, for example 90% of the data have the label 0 and the rest 10% have the label 1.
I want to teach the network with minibatches. So I want the optimizer to give the examples labeled with 1 a learning rate (or somehow change the gradients to be) greater by 9 than those with label 0.
is there any way of doing that?
The problem is that the whole training process is done in this line:
history = model.fit(trainX, trainY, epochs=1, batch_size=minibatch_size, validation_data=(valX, valY), verbose=0)
is there a way to change the fit method in the low level?
You can try using the class_weight parameter of keras.
From keras doc:
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only).
Example of using it in imbalance data: https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#class_weights
class_weights={"class_1": 1, "class_2": 10}
history = model.fit(trainX, trainY, epochs=1, batch_size=minibatch_size, validation_data=(valX, valY), verbose=0, class_weight=class_weights)
Full example:
# Examine the class label imbalance
# you can use your_df['label_class_column'] or just the trainY values.
neg, pos = np.bincount(your_df['label_class_column'])
total = neg + pos
print('Examples:\n Total: {}\n Positive: {} ({:.2f}% of total)\n'.format(
total, pos, 100 * pos / total))
# Scaling by total/2 helps keep the loss to a similar magnitude.
# The sum of the weights of all examples stays the same.
weight_for_0 = (1 / neg)*(total)/2.0
weight_for_1 = (1 / pos)*(total)/2.0
class_weight = {0: weight_for_0, 1: weight_for_1}