I was wondering if anyone knows if there are any glaring problems with the way my H2o autoencoder is being trained that could cause it to take so long? Or if anyone knows of any way I can reduce the time it takes to train this model, both with the dataset and the model construction. Any help would be greatly appreciated! Thank you very much!
I have been training a H2o autoencoder on a dataset consisting of just one-hot-encoded categorical columns. The dataset is of shape (7762,2232) and the model took about 5 hours to train. The code for building the model is as follows:
model = H2ODeepLearningEstimator(
autoencoder = True,
seed = -1
hidden = [2000,1000,500,250,125,50],
epochs = 30,
activation = "Tanh"
)
The problem here is the number of columns. While the number of rows control the overall training time, the number of columns control the training time per row. Having 2232 is quite a lot. If you can do some data munging and reduce the number of predictors you use, it will definitely speed up training.
You can also try the following:
Note that stopping the model early as in 1, 2 may reduce the model training time but will get you a model that may not be a good fit for your data.