How to find the best min_samples for RANSACRegressor for non-linear estimator

I read the scikit-learn documentation about RANSACRegressor. It says

the min_samples parameter is highly dependent upon the model.

So, how one can calculate the min_samples parameter for non-linear estimator? For example, I want to use SVR with rbf kernel. What is the min_sample for this example?

Solution

You cannot generalize a rule to have an approximate min_samples value. However, you can use some domain knowledge to get to a starting point. For example, if the relationship between the features and the target variable is highly nonlinear, then we can assume there might be quite some noise and will want a higher value of min_samples. Higher the value of min_samples we will need higher data points to be inliers before fitting the model. And the vice verse.

On the other hand, you can let the machine estimate it for you. Do a grid search of different values of min_samples during cross-validation and pick the one where the accuracy in both the training and validation set is highest.