I read the scikit-learn documentation about RANSACRegressor. It says
the min_samples parameter is highly dependent upon the model.
So, how one can calculate the min_samples
parameter for non-linear estimator? For example, I want to use SVR
with rbf
kernel. What is the min_sample
for this example?
You cannot generalize a rule to have an approximate min_samples value. However, you can use some domain knowledge to get to a starting point. For example, if the relationship between the features and the target variable is highly nonlinear, then we can assume there might be quite some noise and will want a higher value of min_samples
. Higher the value of min_samples
we will need higher data points to be inliers before fitting the model. And the vice verse.
On the other hand, you can let the machine estimate it for you. Do a grid search of different values of min_samples
during cross-validation and pick the one where the accuracy in both the training and validation set is highest.