Search code examples
linuxdockerscalaapache-sparkapache-zeppelin

Connection from local machine installed Zeppelin to Docker Spark cluster


I am trying to configure Spark interpreter on a local machine installed Zeppelin version 0.10.0 so that I can run scripts on a Spark cluster created also local on Docker. I am using docker-compose.yml from https://github.com/big-data-europe/docker-spark and Spark version 3.1.2.
After docker compose-up, I can see in the browser spark-master on localhost:8080 and History Server on localhost:18081. After reading the ID of the spark-master container, I can also run shell and spark-shell on it (docker exec -it xxxxxxxxxxxx /bin/bash). As host OS I am using Ubuntu 20.04, the spark.master in Zeppelin is set now to spark://localhost:7077, zeppelin.server.port in zeppelin-site.xml to 8070.
There is a lot of information about connecting a container running Zeppelin or running both Spark and Zeppelin in the same container but unfortunately I also use that Zeppelin to connect to the Hive via jdbc on VirtualBox Hortonworks cluster like in one of my previous posts and I wouldn't want to change that configuration now due to hardware resources. In one of the posts (Running zeppelin on spark cluster mode) I saw that such a connection is possible, unfortunately all attempts end with the "Fail to open SparkInterpreter" message.
I would be grateful for any tips.


Solution

  • You need to change the spark.master in Zeppelin to point to the spark master in the docker container not the local machine. Hence spark://localhost:7077 won't work.

    The port 7077 is fine because that is the port specified in the docker-compose file you are using. To get the IP address of the docker container you can follow this answer. Since I suppose your container is named spark-master you can try the following:

    docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' spark-master
    

    Then specify this as the spark.master in Zeppelin: spark://docker-ip:7077