Search code examples
pythonscikit-learnhyperparametersfasttextgridsearchcv

How to use GridSearchCV (python) for maximizing or minimizing a function with parameters?


I would like to maximize a function: func(minCount, wordNgrams, lr, epoch, loss) with GridSearch on only these values:

`{'minCount': [2, 3],
'wordNgrams': [1, 2, 3, 4, 5],
'lr': [0.1, 0.01, 0.001, 0.0001],
'epoch': [5, 10, 15, 20, 25, 30],
'loss': [hs, ns, softmax]}`

I have read about sklearn.model_selection.GridSearchCV(estimator, param_grid, ...) But, I don't know, to where I should put my func(minCount, wordNgrams, lr, epoch, loss)

By the way, I've read about bayesian optimization (https://github.com/fmfn/BayesianOptimization), but don't have any understanding of how to use this with string and int parameters


Solution

  • According to the documentation , you have two solutions:

    • You can pass estimator = func to GridSearchCV, but you also need to pass a scoring function. The scoring function will take func's outputs and return a score (float) that GridSearchCV will seek to optimize. Example:
    def my_scoring_function(func_outputs):
    
      """
      process the outputs of func and return a score. 
    
      if func already reutrns the value you want to minimize, 
      my_scoring_function will be the identity function.
    
      score is the value to optimize
      """
    
      return score
    
    
    cv = GridSearchCV(estimator=func, param_grid=my_param_grid, scoring=my_scoring_function)
    


    • More complex, but elegant: You can rewrite your func as an object implementing scikit-learn's estimator methods (good tutorial here with a gid search example). This means that will basically follow a set of conventions that will make your function behave like scikit-learn's objects. GridSearchCV will then know how to deal with it. This might be overkill for your problem though.


    Regarding bayesian optimisation, it is interesting if your problem meets the following criterions:

    • Evaluating your function is very costly (in terms of time/resource...) and you cannot afford calling it as many times as a grid search requires you to. In your case, you have 720 combinations of parameters to explore, so if one evaluation costs 10s, you will have to run the grid search for 7200s.
    • You want to explore a broader parameter space, or you want to search continuous spaces for some parameters. Typically, it can be interesting for the learning rate. In this case, you can also use a random search, also implemented in scikit learn.

    For more details on bayesian optimization I would recommend this article that I find very comprehensive.