i have a set of imbalanced data for training on a CNN neural net. i want to calculate class weights that will be proportional to the frequency of each label, such that labels that are less frequent will be enhanced when calculating the back-propagation term so that they are well represented.
what i did so far: i have a list A with frequency of each label.
A=[1009,2910,4014,152,605]
so i did the following-
class_weights_new=1/(A/np.min(A))
this produced a list of weights that reduce the learning proportional to the frequency of the label, to reduce over learning of one label over the others.
now i have two questions regarding the matter -
thanks !!!
The most common weight calculation would be,
class_weights = np.array(A/np.sum(A))
So, you get a proper scale.
With your approach, it also works as you can see for high-frequency class the weight is low.
import numpy as np
import matplotlib.pyplot as plt
A=[1009,2910,4014,152,605]
class_weights_new=1/(A/np.min(A))
plt.plot(A)
plt.plot(class_weights_new*4000)
plt.legend(['freq', 'weights'])
plt.show()
print(class_weights_new)
You can use scikit-learn to compute class weight too: https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html