Search code examples

H2O Variable standardization

The documentation in standardize section only includes these algorithms: Deep Learning, GLM, GAM, K-Means.

I have two questions:

  1. Does it mean that other algorithms such as Random Forest, Gradient Boosting, etc, are not standardizing (at least automatically in AutoML)?

  2. Does standardize = TRUE in Deep Learning, GLM, ..., standardize the target variable altogether, or only features?

A related question is Feature Standardize in AutoML H2O.


  • Regarding your question 1. Correct. For algorithms that do not have the standardize parameter, the predictors are not standardized. For tree based algorithms, we are dealing with comparisons like val >= threshold to determine which side of the child nodes to go to. If we implement standardization, we will have to perform (val-mean)/standard deviation >= threshold. In choosing not to standardize will say us a lot of time during the tree traversal because we don't need to perform standardization of the predictors when we are trying to evaluate the expression val >= threshold.

    Regarding question 2: When you set standardize=true, only the numerical features are standardized. The response column is not standardized.