Search code examples
h2ofeature-extractionautoml

How to see which features are used in the trained machine learning model?


After training a model with autoML tool of H2O, I can see the variable importance with saved_model.varimp_plot(). I am curious about the feature engineering part whic H2O claims to do.

I'm trying simple lines of code sapmles in the documentation of H2O.

import h2o
h2o.init()

train_data = h2o.import_file("../full_data.csv")
test_data = h2o.import_file("../201810_pca.csv")

from h2o.automl import H2OAutoML
y = "Label"
x = ['feature0','feature1','feature2','feature3','feature4','feature5','feature6','feature7','feature8','feature9','feature10',
'feature11','feature12','feature13','feature14','feature15','feature16','feature17','feature18','feature19','feature20',
'feature21','feature22','feature23','Amount','DateTime']


aml = H2OAutoML(max_models = 100, max_runtime_secs=100000, seed = 1)
aml.train(x = x, y = y, training_frame = train_data)

lb = aml.leaderboard
lb.head()
lb.head(rows=lb.nrows) # Entire leaderboard

preds = aml.predict(test_data)
h2o.save_model(aml.leader, path = "./Saved_Models")


saved_model = h2o.load_model("./Saved_Models/XGBoost_2_AutoML_20191018_174201")

training_frame = your_model.actual_params['training_frame'] #The part gives error
print(training_frame)

How do I see which features are being used in the trained model? I'd like to see if H2O is extracting and adding new features or not.

I've used my_training_frame = your_model.actual_params['training_frame'] as stated in another question but it gives error: "TypeError: 'property' object has no attribute 'getitem'".


Solution

  • Quick Note H2O.ai has a few products. The open source platform is called H2O-3 and it contains the AutoML algorithm. AutoML does not currently do feature engineering for you. If you want automatic feature engineering, you might be thinking of H2O's product Driverless-AI.

    As for the error you are seeing, this is a bug and you can track the fix here.

    Depending on what you pass to the .train() method, you may or may not hit this bug.