python machine-learning scikit-learn sklearn-pandas grid-search

In GridSearchCV, how do I pass only the default parameters in param_grid?

I'm a beginner, and I have the following code below.

from sklearn.naive_bayes import GaussianNB
from sklearn.decomposition import PCA

pca = PCA()
model = GaussianNB()
steps = [('pca', pca), ('model', model)]
pipeline = Pipeline(steps)

cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
modelwithpca = GridSearchCV(pipeline, param_grid= ,cv=cv)
modelwithpca.fit(X_train,y_train)

This is a local testing, what I'm trying to accomplish is,

i. Perform PCA on the dataset

ii. Use Gaussian Naive Bayes with only the default parameters

iii. Use StratifiedShuffleSplit

So in the end I want the above steps to be carried over to another function that dumps the classifier, the dataset and the feature list to test for performance.

dump_classifier_and_data(modelwithpca, dataset, features)

In the param_grid part, I don't want to test any list of parameters. I just want to have the default parameters used in Gaussian Naive Bayes if that makes sense. What do I change?

Also should there be any changes as to how I instantiate the classifier objects?

Solution

The purpose of GridSearchCV is to test with different parameters for at least one thing in your pipeline (if you don't want to test for different parameters you don't need to use GridSearchCV). So, in general, if you want let's say to test for different PCA n_components. The format to use a pipeline with GridSearchCV would be the following:

gscv = GridSearchCV(pipeline, param_grid={'{step_name}__{parameter_name}': [possible values]}, cv=cv)

e.g.:

# this would perform cv for the 3 different values of n_components for pca
gscv = GridSearchCV(pipeline, param_grid={'pca__n_components': [3, 6, 10]}, cv=cv)

If you use GridSearchCV to tune PCA as above, this of course would mean that your model would have the default values.

If you don't need parameter tuning then GridSearchCV is not the way to go, since using the default parameters of your model for GridSearchCV like this, will only produce a parameter grid with one combination, so it would be like just performing only CV. It wouldn't make sense to do it like this - if I have understood your question correctly:

from sklearn.naive_bayes import GaussianNB
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

pca = PCA()
model = GaussianNB()
steps = [('pca', pca), ('model', model)]
pipeline = Pipeline(steps)

cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
# get the default parameters of your model and use them as a param_grid
modelwithpca = GridSearchCV(pipeline, param_grid={'model__' + k: [v] for k, v in model.get_params().items()}, cv=cv)

# will run 5 times as your cv is configured
modelwithpca.fit(X_train,y_train)

Hope this helps, good luck!