Search code examples
pythongarbage-collectionh2oautoml

How to Remove All Session Objects after H2O AutoML?


I am trying to create an ML application in which a front end takes user information and data, cleans it, and passes it to h2o AutoML for modeling, then recovers and visualizes the results. Since the back end will be a stand-alone / always-on service that gets called many times, I want to ensure that all objects created in each session are removed, so that h2o doesn't get cluttered and run out of resources. The problem is that many objects are being created, and I am unsure how to identify/track them, so that I can remove them before disconnecting each session.

Note that I would like the ability to run more than one analysis concurrently, which means I cannot just call remove_all(), since this may remove objects still needed by another session. Instead, it seems I need a list of session objects, which I can pass to the remove() method. Does anyone know how to generate this list?

Here's a simple example:

import h2o
import pandas as pd

df = pd.read_csv("C:\iris.csv")
my_frame = h2o.H2OFrame(df, "my_frame")

aml = H2OAutoML(max_runtime_secs=100)
aml.train(y='class', training_frame=my_frame)

Looking in the Flow UI shows that this simple example generated 5 new frames, and 74 models. Is there a session ID tag or something similar that I can use to identify these separately from any objects created in another session, so I can remove them?

Frames Created

Models Created


Solution

  • The recommended way to clean only your work is to use h2o.remove(aml). This will delete the automl instance on the backend and cascade to all the submodels and attached objects like metrics. It won't delete the frames that you provided though (e.g. training_frame).