python numpy machine-learning scikit-learn grid-search

What order are grid search combinations handled in sklearn?

I have a question about the order in which sklearn's GridSearchCV object handles its hyperparameter combinations. Specifically, I performed a gridsearch using sklearn with parameters:

param1 = [val1, val2, val3, val4, val5]
param2 = [num1, num2]

The mean_test_score attribute of cv_results_ is an array of length 10 as expected ( len(param1)*len(param2) ); however, I do not know which value corresponds to what combination. That is, are the values of param1 held which param2 is cycled or vice versa.

That is, do the 10 values in mean_test_score correspond to

[ [val1, num1], [val1, num2], [val2, num1], [val2, num2], ... ]

(where param2 is cycled before param1) or

[ [val1, num1], [va2, num1], [val3, num1], [val4, num1], [val5, num1], [val1, num2], ... ]

(where param1 is cycled before param2). Does it just depend on the order in which they are specified in the grid search? Can I return the results along one specific hyperparameter value?

Thanks!

Solution

GridSearchCV uses the class named ParameterGrid inside, that you can check here (lines 47, 114)

This is more or less what ParameterGrid does inside your GridSearchCV:

from itertools import product

grid_values= [{"param1": [1, 2, 3, 4, 5], "param2": [1, 2]}]

def grid(grid_values):
    for p in grid_values:
        # Always sort the keys of a dictionary, for reproducibility
        print(p)
        items = sorted(p.items())
        if not items:
            yield {}
        else:
            keys, values = zip(*items)
            for v in product(*values):
                params = dict(zip(keys, v))
                yield params

It first of all wrap your dict in a list (because it can handle different kind of data as input, for example a list of dicts)
```
grid_values= [{"param1": [1, 2, 3, 4, 5], "param2": [1, 2]}]
```
after that it performs a sort on the keys of your dict, for reproducibility purpose. Which will determine your combinations
```
  items = sorted(p.items())
```
then it uses the product function from itertools that does what you thought (here details). A nested for loop on your variables. But starting with values sorted by the parameters' names!
```
for v in product(*values):
    params = dict(zip(keys, v))
    yield params
```

Check also the doc of ParameterGrid