Search code examples
pythonpysparkh2osparkling-water

Sparkling Water - run python script as a Spark Application


I have some trouble with Sparkling Water to run a python script as a Spark Application. I use this command to execute my script on Spark :

./bin/spark-submit \

--packages ai.h2o:sparkling-water-core_2.10:1.5.12 \

--py-files $SPARKLING_HOME/py/dist/pySparkling-1.5.12-py2.7.egg $SPARKLING_HOME/Python/test.py

and I have this falling error :

py4j.protocol.Py4JError: Trying to call a package.

logs :

> Traceback (most recent call last):
  File "/Users/Documents/sparkling-water-1.5.12/Python/test.py", line 5, in <module>
    hc= H2OContext(sc).start()
  File "/Users/Documents/sparkling-water-1.5.12/py/dist/pySparkling-1.5.12-py2.7.egg/pysparkling/context.py", line 72, in __init__
  File "/Users/Documents/sparkling-water-1.5.12/py/dist/pySparkling-1.5.12-py2.7.egg/pysparkling/context.py", line 96, in _do_init
  File "/Users/Documents/spark-1.5.2-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 726, in __getattr__
py4j.protocol.Py4JError: Trying to call a package.
16/04/11 16:58:39 INFO SparkContext: Invoking stop() from shutdown hook
16/04/11 16:58:39 INFO SparkUI: Stopped Spark web UI at http://192.168.181.84:4042
16/04/11 16:58:39 INFO DAGScheduler: Stopping DAGScheduler
16/04/11 16:58:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/04/11 16:58:39 INFO MemoryStore: MemoryStore cleared
16/04/11 16:58:39 INFO BlockManager: BlockManager stopped
16/04/11 16:58:39 INFO BlockManagerMaster: BlockManagerMaster stopped
16/04/11 16:58:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/04/11 16:58:39 INFO SparkContext: Successfully stopped SparkContext
16/04/11 16:58:39 INFO ShutdownHookManager: Shutdown hook called
16/04/11 16:58:39 INFO ShutdownHookManager: Deleting directory /private/var/fold

How can I resolve this issue ? I am following exactly the command from the booklet : https://h2o-release.s3.amazonaws.com/h2o/rel-turan/3/docs-website/h2o-docs/booklets/SparklingWaterVignette.pdf


Solution

  • It's actually a critical bug we know about in Sparkling Water team and it's fixed in a new release with other hotfixes. The bug is already fixed ( https://0xdata.atlassian.net/browse/SW-107) and a new release should be out very soon.

    I'll keep you updated and let you know when new release is out.

    EDITED 29 April 2016

    New release with the fix is out.

    For spark 1.6 - http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/3/index.html

    For spark 1.5 - http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.5/14/index.html

    You don't need to call -packages any more to add sparkling-water-core. The pySparkling egg file already contains all necessary Java/Scala classes it needs. So all you need to do is just set egg file using the py-files option and that should be it.