Search code examples
deep-learningword2vech2ogrid-searchgbm

Word2Vec word embeddings on H2O Deep Water with GPU


We followed the text categorization process with iterating the items below:

  1. Create a Word2Vec word embedding model with text documents.
  2. Do a Grid Search and tree depth parameters.
  3. Select the best performed Final GBM model.

As we iterate through the list CPU cores are working at %100 load. Are there any procedure or solution iterating the above process with H2O Deep Water GPU capabilities ?


Solution

  • No, no and maybe.

    The maybe is that you could switch from GBM to xgboost, which does have a GPU option (I believe only single-node is supported, and only in Linux currently). xgboost is apparently slightly quicker on small data sets, h2o.gbm slightly quicker on large data sets. If you have a GPU going free, and are using the latest version of H2O, it should be easy to swap h2o.gbm with h2o.xgboost (H2OXGBoostEstimator if using Python API) and see for yourself.

    I'd be interested to hear the relative timings!

    (BTW, the 2nd "no" is for GPU use specifically for grids; but all the effort is in the models, not the grid itself, so the 2nd "no" could just as well be "N/A")