Search code examples
pythonmachine-learningscikit-learndecision-treegrid-search

Grid search parameters for Decision Tree


I am using a decision tree classifier, and I want to use cv to find the best possible parameters. I may specify something as:

     parameter_grid = {
    'max_depth': range(2, 10),
    'max_features': range(2, 14)}

Firstly, how do I decide which parameter ranges to use? Is it random or there are best practices behind this? Another point is that once I have done this, is there a way to get each individual parameter and its value through code? Thanks


Solution

  • The best you can do here is to search either in the docs, or using other reliable resources, which are the usual and most appropriate settings or heuristics for the parameter search of each algorithm. Knowing exactly which value to set each of the parameters requires a good understanding of what they are doing.

    Here are some thoughts on the ones you've shared:

    • max_depth: In theory it could be as high as the amount of training samples, that of course would result on a complete overfitting. However, keeping it excessively low might result in an underfitting of your model. So usually you want to keep this one in rather small ranges, such as the one you've used.

    • max_features: This basically limits the amount of features to look at to define each split of the tree. In the case of having a big amount of features, it's a good idea to limit its value, otherwise by default you'll have that max_features=n_features. Though perhaps rather than specifying a range here you may search over these recommended rule of thumb approaches proposed in the docs:

      • max_features: int, float or {“auto”, “sqrt”, “log2”}
      • If “auto”, then max_features=sqrt(n_features).
      • If “sqrt”, then max_features=sqrt(n_features).
      • If “log2”, then max_features=log2(n_features).

    So in general I'd suggest you carefully look at what each of them does, and follow suggestions from reliable resources. Note that in the docs you also have suggested values for several parameters.