I have binary classification problem where one class represented 99.1% of all observations (210 000). As a strategy to deal with the imbalanced data, I choose sampling techniques. But I don't know what to do: undersampling my majority class or oversampling the less represented class. If anybody have an advise?
Thank you.
P.s. I use random forest algorithm from sklearn.
is a hyperparameter. Do cross validation which ones works best. But use a Training/Test/Validation set.