Search code examples
apache-sparkhuespark-notebook

Why does Spark Notebook in Hue report "Gateway timeout Error 504"?


I'm using Hadoop 2.2.6 mini cluster (1 Master and 3 slaves) with Ambari 2.1.0 and Hue 3.8.1, Ubuntu 12.04. Spark 1.2.1 (using Scala 2.10.3) was installed as part of the Ambari setup. I've installed Livy-Server to be able use Spark Notebook in Hue. The configauration in hue.ini file are as follows:

[spark]

# Host address of the Livy Server.

livy_server_host=host1.com

# Port of the Livy Server.

livy_server_port=8998

# Configure livy to start with 'process', 'thread', or 'yarn' workers.

livy_server_session_kind=process

# List of available types of snippets

languages='[{"name": "Scala Shell", "type": "spark"},{"name": "PySpark Shell", "type": "pyspark"},{"name": "R Shell", "type": "r"},{"name": "Jar", "type": "Jar"},{"name": "Python", "type": "py"},{"name": "Impala SQL", "type": "impala"},{"name": "Hive SQL", "type": "hive"},{"name": "Text", "type": "text"}]'

But when I try to type simple command in the Notebook, every time I have same error: Gateway timeout Error 504. When I manually start the Livy-Server from the shell by using:

./build/env/bin/hue livy_server

I have outcome as been asked in other article, but without solution: http://gethue.com/new-notebook-application-for-spark-sql/#comment-56901 Any ideas how to fix that or even where to look at ?! Other apps works fine, apart from the Spark Notebook. I'm new to big data and hadoop, reading the forums for possible solution, but didn't find anything related to this problem, is that misconfiguration or I've missed something during installation? Any help highly appreciated. Thanks


Solution

  • Livy was never tested with Spark 1.2. It was initially created for 1.3, is currently working with 1.4 and 1.5 is almost all there: https://github.com/cloudera/hue/tree/master/apps/spark/java#prerequisites