Search code examples
scalaapache-sparkapache-spark-mllibh2osparkling-water

Sparkling water: Can't make use of the support of spark ml pipelines


According to this blog by the Sparkling water guys, you are now able to use the Spark ML pipelines components to build a DL model in the latest versions. I tried adding the latest versions in my build.sbt

"org.apache.spark" % "spark-mllib_2.10" % "2.0.0" % "provided",
"ai.h2o" % "sparkling-water-core_2.10" % "1.6.5" % "provided"

but no luck, trying to import org.apache.spark.ml.h2o.H2OPipeline doesn't work. The h2o package inside spark.ml doesn't seem to exist in the spark jars. Even though it seems to work in the above link as well as here.I really want to reuse my spark-mllib feature transformers to create a DL model using h2o, as shown in the blog.

Any help appreciated!

Thanks.


Solution

  • 1) please dont use spark 2 with sw 1.6.5 - it won't work. We released sw2.0 for scala 2.11 https://mvnrepository.com/artifact/ai.h2o/sparkling-water-core_2.11

    2) you're only adding SW core in your build, the classes you are looking for are in sparkling-water-ml https://mvnrepository.com/artifact/ai.h2o/sparkling-water-ml_2.11