Is H2O R package safe to use for secured ( Patient ) data?

The H2O R package is a great resource for building predictive models. But I am concerned with the security aspect of it.

Is it safe to use patient data with H2O in terms of security vulnerabilities ?

Solution

After data ingestion into H2O-3, the data lives in-memory inside the java server process. Once the H2O process is stopped, the in-memory data vanishes.

Probably the main thing to be aware of is your data is not sent to a SaaS cloud service or anything like that. The H2O-3 java instance itself handles your data. You can create models in a totally air-gapped, no-internet environment.

So the short answer is, it’s perfectly safe if you know what threats you are trying to secure against and do the right things to avoid the relevant vulnerabilities (including data vulnerabilities like leaking PII and software vulnerabilities like not enabling passwords or SSL).

You can read about how to secure H2O instances and the corresponding R client here:

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/security.html

(Note if you have a high-value use case and want detailed personal help with this kind of thing, H2O.ai the company offers paid enterprise support.)