Search code examples
pythonpandasscikit-learnpreprocessorsmote

How do we set ratio in SMOTE to have more positive sample than negative sample?


I am trying to use SMOTE to handle imbalanced class data in binary classification, and what I know is: if we use, for example

sm = SMOTE(ratio = 1.0, random_state=10)

Before OverSampling, counts of label '1': [78]
Before OverSampling, counts of label '0': [6266] 

After OverSampling, counts of label '1': 6266
After OverSampling, counts of label '0': 6266

for case where class 1 is minority, it will result in 50:50 number of class 0 and 1

and

sm = SMOTE(ratio = 0.5, random_state=10)

Before OverSampling, counts of label '1': [78]
Before OverSampling, counts of label '0': [6266] 

After OverSampling, counts of label '1': 3133
After OverSampling, counts of label '0': 6266

will result class 1 to be halved size of class 0.

My question:

how do we set the ratio to obtain more class 1 than class 0, for instance 75:25?


Solution

  • Try using a dictionary.

    smote_on_1 = 18798 
    #(In your case 18798 is thrice of 6266)
    
    smt = SMOTE(sampling_strategy={1: smote_on_1})
    X_train, y_train = smt.fit_sample(X_train, y_train)