machine-learning ipython classification normalization imbalanced-data

renormalizing class weights for imbalanced data

i have a set of imbalanced data for training on a CNN neural net. i want to calculate class weights that will be proportional to the frequency of each label, such that labels that are less frequent will be enhanced when calculating the back-propagation term so that they are well represented.

what i did so far: i have a list A with frequency of each label.

A=[1009,2910,4014,152,605]

so i did the following-

class_weights_new=1/(A/np.min(A))

this produced a list of weights that reduce the learning proportional to the frequency of the label, to reduce over learning of one label over the others.

now i have two questions regarding the matter -

is there something wrong with my logic, am i missing something?
so far this calculation produced worse performance, and i want to perhaps smoothen the weights such that they will still have some imbalance in them. i mean that the ratio between labels will remain somewhat the same, but they all will tend closer to 1. what is the mathematical operation that will give me such result?

thanks !!!

Solution

The most common weight calculation would be,

class_weights = np.array(A/np.sum(A))

So, you get a proper scale.

With your approach, it also works as you can see for high-frequency class the weight is low.

import numpy as np
import matplotlib.pyplot as plt

A=[1009,2910,4014,152,605]

class_weights_new=1/(A/np.min(A))

plt.plot(A)
plt.plot(class_weights_new*4000)
plt.legend(['freq', 'weights'])
plt.show()

print(class_weights_new)

You can use scikit-learn to compute class weight too: https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html