Search code examples
python-3.xscikit-learnrandom-forestgrid-search

scikit-lean GridSearchCV n_jobs != 1 freezing


I'm running grid search on random forests and trying to use n_jobs different than one but the kernel freezes, there is no CPU usage. With n_jobs=1 it works fine. I can't even stop the command with ctl-C and have to restart the kernel. I'm running on windows 7. I saw that there is a similar problem with OS X but the solution is not relevant for windows 7.

from sklearn.ensemble import RandomForestClassifier
rf_tfdidf = Pipeline([('vect',tfidf),
                  ('clf', RandomForestClassifier(n_estimators=50, 
class_weight='balanced_subsample'))])

param_grid = [{'vect__ngram_range':[(1,1)],
          'vect__stop_words': [stop],
          'vect__tokenizer':[tokenizer]
          }]
if __name__ == '__main__':
gs_rf_tfidf = GridSearchCV(rf_tfdidf, param_grid, scoring='accuracy', cv=5, 
                                                           verbose=10, 
                                                           n_jobs=2)
gs_rf_tfidf.fit(X_train_part, y_train_part)

thanks.


Solution

  • The indent after if __name__ == '__main__': is not correct. If it's not the case and it's a copy paste mistake then you can try something like :

    if __name__ =='main':
        # your code indented !
    

    So the first line of your script is if __name__ == '__main__': and then the rest code follows with the appropriate indent.

    New Code

    from sklearn.ensemble import RandomForestClassifier
    from sklearn.pipeline import Pipeline    
    
    if __name__ == '__main__':
    
        rf_tfdidf = Pipeline([('vect',tfidf),('clf', RandomForestClassifier(n_estimators=50,class_weight='balanced_subsample'))])
    
        param_grid = [{'vect__ngram_range':[(1,1)],'vect__stop_words': [stop],'vect__tokenizer':[tokenizer]}]
    
        gs_rf_tfidf = GridSearchCV(rf_tfdidf, param_grid, scoring='accuracy', cv=5,verbose=10, n_jobs=-1)
    
        gs_rf_tfidf.fit(X_train_part, y_train_part)
    

    This works fine for me (windows 8.1)

    EDIT

    The following works fine using PyCharm. I have not used spyder but it should also work for spyder:

    Code

    Class Test(object):
        def __init__(self):
            ###code here
            ###code here    
    
    if __name__ == '__main__':
        Test()