I am trying to use SMOTE to handle imbalanced class data in binary classification, and what I know is: if we use, for example
sm = SMOTE(ratio = 1.0, random_state=10)
Before OverSampling, counts of label '1': [78]
Before OverSampling, counts of label '0': [6266]
After OverSampling, counts of label '1': 6266
After OverSampling, counts of label '0': 6266
for case where class 1 is minority, it will result in 50:50 number of class 0 and 1
and
sm = SMOTE(ratio = 0.5, random_state=10)
Before OverSampling, counts of label '1': [78]
Before OverSampling, counts of label '0': [6266]
After OverSampling, counts of label '1': 3133
After OverSampling, counts of label '0': 6266
will result class 1 to be halved size of class 0.
My question:
how do we set the ratio to obtain more class 1 than class 0, for instance 75:25?
Try using a dictionary.
smote_on_1 = 18798
#(In your case 18798 is thrice of 6266)
smt = SMOTE(sampling_strategy={1: smote_on_1})
X_train, y_train = smt.fit_sample(X_train, y_train)