Search code examples
pythonmachine-learningscikit-learnxgbclassifier

Hyperparameters tuning using GridSearchCV


I'm new to machine learning and I'm trying to predict the topic of an article given a labeled datasets that each contains all the words in one article. There are 11 different topics total and each article only has single topic. I have built a process pipeline:

classifier = Pipeline([
    ('vectorizer', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(XGBClassifier(objective="multi:softmax", num_class=11), n_jobs=-1)),
])

I'm trying to implement a GridsearchCV to find the best hyperparameters:

parameters = {'vectorizer__ngram_range': [(1, 1), (1, 2),(2,2)],
               'tfidf__use_idf': (True, False)}
gs_clf_svm = GridSearchCV(classifier, parameters, n_jobs=-1, cv=10, scoring='f1_micro')
gs_clf_svm = gs_clf_svm.fit(X, Y)

This works fine, however, how do I tune the hyperparameters of XGBClassifier? I have tried using the notation:

parameters = {'clf__learning_rate': [0.1, 0.01, 0.001]}

It doesn't work because GridSearchCV is looking for the hyperparameters of OneVsRestClassifier. How to actually tune the hyperparameters of XGBClassifier? Also, what hyperparameters are you suggesting worth tuning for my problem?


Solution

  • As is, the pipeline looks for a parameter learning_rate in OneVsRestClassifier, can't find one (unsurprisingly, since the module does not have such a parameter), and raises an error. Since you actually want the parameter learning_rate of XGBClassifier, you should go a level deeper, i.e.:

    parameters = {'clf__estimator__learning_rate': [0.1, 0.01, 0.001]}