Search code examples
h2o

Will Run H2O on local desktop will speed up calculation?


I just start to learn H2O. I am confused about if i run H2O at home just for leaning purpose. When I simply run "h2o.init()" then start data clean or modeling using H2O. Will it speed up the calculation speed for big data? Is it automatically connect to some H2O cluster online? Where is the H2O cluster located?


Solution

  • When you run h2o.init() (i.e. with no arguments) it will start a "cluster", on that same machine. By default it will be given about a quarter of your machine's memory, and can use either all threads or two threads (the latter is if using R and you installed it from CRAN). You will find Flow listening on http://127.0.0.1:54321/

    If you already have an H2O cluster running on another machine (whether on your LAN or a distant cloud server), give the address to h2o.init() to have it connect to that instead of starting anything locally.

    Run help(h2o.init) (on Python) or ?h2o.init (on R) to see all the available options.

    NOTE: H2O is a client/server architecture, but the server (also called the "cluster", even if you only have one machine) is where all the action takes place, and where the data and models are kept, and the client is relatively thin. Responding to one of the comments, if you are comparing H2O running localhost to a library like scikit-learn, there is not much difference (in available compute power). The advantage of H2O is that you can easily and transparently add more machines over a LAN, to increase available memory and (to some extent) compute power; and having clients in languages other than R. The disadvantages are mainly around having to remember the server is where your data is kept; e.g. with large data sets use the functions to load it directly into your server, because keeping a copy in the client is just wasting memory.