Search code examples
h2o

Is Variable Importance calculation on H2O based on Train set?


I'm pretty sure that h2o.varimp is not based on data other than train or validation since the test data is never put into the model.

I was reading the h2o documents about Variable Importance but couldn't find what it is based on. Is it based on training or a validation set? Is there a way to check the importance on the test data?


Solution

  • Yes, variable importance is calculated based on the training datasets alone. For GLM, they are related to the coefficients of the model. For GBM, they are calculated as we are building the various trees. Hence, they cannot be calculated from validation or test datasets as those datasets are not used to generate the various model parameters.