Search code examples
pythonmachine-learningscikit-learn

how do i set ‘random_state’ correctly so that my results are always the same?


If I have for example this snippet of code:

knn = KNeighborsClassifier()
grid_search_knn = GridSearchCV(
    estimator=knn,
    n_jobs=-1)

Do I have to set it like this:

knn = KNeighborsClassifier(random_state=42)

grid_search_knn = GridSearchCV(
    estimator=knn,
    n_jobs=-1
)

Or do I have to set it like this?

knn = KNeighborsClassifier(random_state=42)

grid_search_knn = GridSearchCV(
    estimator=knn,
    random_state=42,
    n_jobs=-1
)

what is the correct why? And what if I use randomisedsearch instead of gridsearch?


Solution

  • In this case, setting the random_state depends on the specific algorithm you’re using, rather than on the GridSearchCV or RandomizedSearchCV class.

    For KNeighborsClassifier, adding random_state is actually unnecessary because this classifier is a deterministic algorithm, meaning it doesn’t rely on randomness to make predictions. Therefore, it won’t be affected by a random_state parameter. As a result:

    1. For KNeighborsClassifier: You don’t need to set random_state at all in either the classifier or in the GridSearchCV/RandomizedSearchCV.

    2. For Randomized Algorithms: If you’re using an algorithm that involves randomness, like a decision tree or a random forest, you can set the random_state in the estimator (like RandomForestClassifier(random_state=42)). You don’t need to set random_state in GridSearchCV, as it only influences the cross-validation process, which is deterministic.

    In summary:

    • For KNeighborsClassifier: No random_state is needed.
    • For Randomized algorithms: Set random_state in the estimator, not in GridSearchCV/RandomizedSearchCV.
    • For RandomizedSearchCV: You might set random_state there if the search itself is randomized and you want reproducibility.