python machine-learning classification random-forest supervised-learning

Imbalanced data: undersampling or oversampling?

I have binary classification problem where one class represented 99.1% of all observations (210 000). As a strategy to deal with the imbalanced data, I choose sampling techniques. But I don't know what to do: undersampling my majority class or oversampling the less represented class. If anybody have an advise?

Thank you.

P.s. I use random forest algorithm from sklearn.

Solution

oversampling or
under sampling or
over sampling the minority and under sampling the majority

is a hyperparameter. Do cross validation which ones works best. But use a Training/Test/Validation set.

matplotlib contourf: get Z value under cursor
Detect if an OCR text image is upside down
How to sort a list of lists of lists by the total number of items?
What is the recommended way for retrieving row numbers (index) for polars?
How to upload an image file to Github using PyGithub?
How to store and load a Python dictionary with HDF5
Prevent Selenium from taking the focus to the opened window
Subclass super().__init__(*args, **kwargs) not working. Says object.__init__() takes exactly one argument (the instance to initialize) when it doesn't
Overriding python threading.Thread.run()
When searching for strings in a dataframe via lambda function, how do I resolve the IndexError if no results are found in any column or row?
ValuerError: Found input variables with inconsistent numbers of samples
sqlite3 or CSV files
isPrime Function for Python Language
Sending data and files as multipart request using aiohttp
Official repository of Unicode character names
How to create child window and communicate with parent in TkInter
with pytest.raises(Exception) not working with flask app.post
ModuleNotFoundError with pyinstaller but fine in Pycharm
Storing and retreiving data with Milvus and Langchain
Python Barplot to represent Ranges
Docker image python-alpine fails when using lib
create azure functionapp from docker image via cli
Routing submodule functions using FastAPI's APIRouter
clone element with beautifulsoup
Sharing variables between two seperatly running python scripts
How do I filter a table by clicking on a bar chart segment?
pyarrow: find diff for chunkedarray
Skip statements in case of last item in iterator
csv file to host.yaml for Nornir
When I'm plotting a colorbar in python using matplotlib I'm getting an error