I have a docker image called ubuntu_mesos_spark. I installed zookeeper on it. I change “zoo.cfg” file like this: This is “zoo.cfg” in node1(150.20.11.157)
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2187
dataDir=/var/lib/zookeeper
server.1=0.0.0.0:2888:3888
server.2=150.20.11.157:2888:3888
server.3=150.20.11.137:2888:3888
This is “zoo.cfg” in node1(150.20.11.134)
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2187
dataDir=/var/lib/zookeeper
server.1=150.20.11.157:2888:3888
server.2=0.0.0.0:2888:3888
server.3=150.20.11.137:2888:3888
This is “zoo.cfg” in node1(150.20.11.137)
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2187
dataDir=/var/lib/zookeeper
server.1=150.20.11.157:2888:3888
server.2=150.20.11.134:2888:3888
server.3=0.0.0.0:2888:3888
Also I made a “myid” file in “/var/lib/zookeeper” of each node. For example for “150.20.11.157” its ID is “1” in myid file. I installed Mesos and Spark on the docker too. I have a Mesos cluster of these three nodes too. I defined IP address of slaves nodes on this file: “spark/conf/slaves”
150.20.11.134
150.20.11.137
I added these lines in “spark/conf/spark-env.sh”:
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=/home/spark/program_file/spark-2.3.2-bin-
hadoop2.7.tgz
Moreover, I added these lines in my “~/.bashrc” file:
export SPARK_HOME="/home/spark"
PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-
src.zip:$PYTHO$
export PYSPARK_HOME=/usr/bin/python3.6
export PYSPARK_DRIVER_PYTHON=python3.6
export ZOO_LOG_DIR=/var/log/zookeeper
I want to run master code in “150.20.11.157”.My docker-compose is :
version: '3.7'
services:
zookeeper:
image: ubuntu_mesos_spark
command: /zookeeper-3.4.12/bin/zkServer.sh start
environment:
ZOOKEEPER_SERVER_ID: 1
ZOOKEEPER_CLIENT_PORT: 2187
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_INIT_LIMIT: 10
ZOOKEEPER_SYNC_LIMIT: 5
ZOOKEEPER_SERVERS:
0.0.0.0:2888:3888;150.20.11.134:2888:3888;150.20.11.137:2888:3888
network_mode: host
expose:
- 2187
- 2888
- 3888
ports:
- 2187:2187
- 2888:2888
- 3888:3888
master:
image: ubuntu_mesos_spark
command: bash -c "sleep 20; /home/mesos-1.7.0/build/bin/mesos-
master.sh --ip=150.20.11.157 --work_dir=/var/run/mesos"
restart: always
depends_on:
- zookeeper
environment:
- MESOS_HOSTNAME="150.20.11.157,150.20.11.134,150.20.11.137"
- MESOS_QUORUM=1
- MESOS_LOG_DIR=/var/log/mesos
expose:
- 5050
- 4040
- 7077
- 8080
ports:
- 5050:5050
- 4040:4040
- 7077:7077
- 8080:8080
Also, I run this compose file on slaves nodes :“150.20.11.134,150.20.11.137”:
version: '3.7'
services:
zookeeper:
image: ubuntu_mesos_spark
command: /zookeeper-3.4.12/bin/zkServer.sh start
environment:
ZOOKEEPER_SERVER_ID: 2
ZOOKEEPER_CLIENT_PORT: 2187
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_INIT_LIMIT: 10
ZOOKEEPER_SYNC_LIMIT: 5
ZOOKEEPER_SERVERS:
0.0.0.0:2888:3888;150.20.11.134:2888:3888;150.20.11.137:2888:3888
network_mode: host
expose:
- 2187
- 2888
- 3888
ports:
- 2187:2187
- 2888:2888
- 3888:3888
slave:
image: ubuntu_mesos_spark
command: bash -c "/home/mesos-1.7.0/build/bin/mesos-slave.sh --
master=150.20.11.157:5050 --work_dir=/var/run/mesos
--systemd_enable_support=false"
restart: always
privileged: true
network_mode: host
depends_on:
- zookeeper
environment:
- MESOS_HOSTNAME="150.20.11.157,150.20.11.134,150.20.11.137"
- MESOS_MASTER=150.20.11.157
- MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins #also in Dockerfile
- MESOS_CONTAINERIZERS=docker,mesos
- MESOS_LOG_DIR=/var/log/mesos
- MESOS_LOGGING_LEVEL=INFO
expose:
- 5051
ports:
- 5051:5051
First I run "sudo docker-compose up" on Master node. Then I run it on slaves nodes. But I get this error:
On Master node, the error is:
Starting marzieh-compose_zookeeper_1 ... done
Recreating marzieh-compose_master_1 ... done
Attaching to marzieh-compose_zookeeper_1, marzieh-compose_master_1
zookeeper_1 | ZooKeeper JMX enabled by default
zookeeper_1 | Using config: /zookeeper-3.4.12/bin/../conf/zoo.cfg
zookeeper_1 | Starting zookeeper ... STARTED
marzieh-compose_zookeeper_1 exited with code 0
master_1 | I0123 11:46:59.585522 7 logging.cpp:201] INFO level logging started!
master_1 | I0123 11:46:59.586066 7 main.cpp:242] Build: 2019-01-21 05:16:39 by master_1 | I0123 11:46:59.586097 7 main.cpp:243] Version: 1.7.0
master_1 | F0123 11:46:59.587368 7 process.cpp:1115] Failed to initialize: Failed to bind on 150.20.11.157:5050: Cannot assign requested address
master_1 | * Check failure stack trace: *
master_1 | @ 0x7f505ce54b9c google::LogMessage::Fail()
master_1 | @ 0x7f505ce54ae0 google::LogMessage::SendToLog()
master_1 | @ 0x7f505ce544b2 google::LogMessage::Flush()
master_1 | @ 0x7f505ce57770
google::LogMessageFatal::~LogMessageFatal()master_1 | @ 0x7f505cd19ed1 process::initialize()
master_1 | @ 0x55fb7b12981a main
master_1 | @ 0x7f504f0d0830 (unknown)
master_1 | @ 0x55fb7b1288b9 _start
master_1 | bash: line 1: 7 Aborted (core dumped) /home/mesos-1.7.0/build/bin/mesos-master.sh --ip=150.20.11.157 --work_dir=/var/run/mesos
Moreover when I run "sudo docker-compose up" on slave nodes. I got this error:
slave_1 | F0123 11:40:06.878793 1 process.cpp:1115] Failed to initialize: Failed to bind on 0.0.0.0:5051: Address already in use
slave_1 | * Check failure stack trace: *
slave_1 | @ 0x7fee9d319b9c google::LogMessage::Fail()
slave_1 | @ 0x7fee9d319ae0 google::LogMessage::SendToLog()
slave_1 | @ 0x7fee9d3194b2 google::LogMessage::Flush()
slave_1 | @ 0x7fee9d31c770
google::LogMessageFatal::~LogMessageFatal()slave_1 | @ 0x7fee9d1deed1 process::initialize()
slave_1 | @ 0x55e99f661784 main
slave_1 | @ 0x7fee8f595830 (unknown)
slave_1 | @ 0x55e99f65f139 _start
slave_1 | * Aborted at 1548243606 (unix time) try "date -d @1548243606" if you are using GNU date *
slave_1 | PC: @ 0x7fee8f5ac196 (unknown)
slave_1 | * SIGSEGV (@0x0) received by PID 1 (TID 0x7fee9f9f38c0) from PID 0; stack trace: *
slave_1 | @ 0x7fee8fee8390 (unknown)
slave_1 | @ 0x7fee8f5ac196 (unknown)
slave_1 | @ 0x7fee9d32055b google::DumpStackTraceAndExit()
slave_1 | @ 0x7fee9d319b9c google::LogMessage::Fail()
slave_1 | @ 0x7fee9d319ae0 google::LogMessage::SendToLog()
slave_1 | @ 0x7fee9d3194b2 google::LogMessage::Flush()
slave_1 | @ 0x7fee9d31c770 google::LogMessageFatal::~LogMessageFatal()
slave_1 | @ 0x7fee9d1deed1 process::initialize()
slave_1 | @ 0x55e99f661784 main
slave_1 | @ 0x7fee8f595830 (unknown)
slave_1 | @ 0x55e99f65f139 _start
slave_1 | I0123 11:41:07.818897 1 logging.cpp:201] INFO level logging started!
slave_1 | I0123 11:41:07.819437 1 main.cpp:349] Build: 2019-01-21 05:16:39 by
slave_1 | I0123 11:41:07.819470 1 main.cpp:350] Version: 1.7.0
slave_1 | I0123 11:41:07.823354 1 resolver.cpp:69] Creating default secret resolver
slave_1 | E0123 11:41:07.927773 1 main.cpp:483] EXIT with status 1: Failed to create a containerizer: Could not create DockerContainerizer: Failed to create docker: Failed to get docker version: Failed to execute 'docker -H unix:///var/run/docker.sock -- version': exited with status 127
I searched a lot about that and I could not figure this out. Would you please guide me what the right way is to write docker compose for running Mesos and Spark cluster on docker?
Any help would be appreciated.
Thanks in advance.
Problem solved. I changed docker compose like this and Master and Slaves run without problem:
"docker-compose.yaml" in Master node is in the following:
version: '3.7'
services:
zookeeper:
image: ubuntu_mesos_spark_python3.6_client
command: /home/zookeeper-3.4.12/bin/zkServer.sh start
environment:
ZOOKEEPER_SERVER_ID: 1
ZOOKEEPER_CLIENT_PORT: 2188
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_INIT_LIMIT: 10
ZOOKEEPER_SYNC_LIMIT: 5
ZOOKEEPER_SERVERS: 0.0.0.0:2888:3888;150.20.11.157:2888:3888
network_mode: host
expose:
- 2188
- 2888
- 3888
ports:
- 2188:2188
- 2888:2888
- 3888:3888
master:
image: ubuntu_mesos_spark_python3.6_client
command: bash -c "sleep 30; /home/mesos-1.7.0/build/bin/mesos-master.sh
--ip=150.20.10.136 --work_dir=/var/run/mesos --hostname=x.x.x.x" ##hostname :
IP of the master node
restart: always
network_mode: host
depends_on:
- zookeeper
environment:
- MESOS_HOSTNAME="150.20.11.136"
- MESOS_QUORUM=1
- MESOS_LOG_DIR=/var/log/mesos
expose:
- 5050
- 4040
- 7077
- 8080
ports:
- 5050:5050
- 4040:4040
- 7077:7077
- 8080:8080
Also,"docker-compose.yaml" file in slave node is like this:
version: '3.7'
services:
zookeeper:
image: ubuntu_mesos_spark_python3.6_client
command: /home/zookeeper-3.4.12/bin/zkServer.sh start
environment:
ZOOKEEPER_SERVER_ID: 2
ZOOKEEPER_CLIENT_PORT: 2188
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_INIT_LIMIT: 10
ZOOKEEPER_SYNC_LIMIT: 5
ZOOKEEPER_SERVERS: 150.20.11.136:2888:3888;0.0.0.0:2888:3888
network_mode: host
expose:
- 2188
- 2888
- 3888
ports:
- 2188:2188
- 2888:2888
- 3888:3888
slave:
image: ubuntu_mesos_spark_python3.6_client
command: bash -c "sleep 30; /home/mesos-1.7.0/build/bin/mesos-slave.sh
--master=150.20.11.136:5050 --work_dir=/var/run/mesos
--systemd_enable_support=false"
restart: always
privileged: true
network_mode: host
depends_on:
- zookeeper
environment:
- MESOS_HOSTNAME="150.20.11.157"
#- MESOS_MASTER=172.28.10.136
#- MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins #also in Dockerfile
#- MESOS_CONTAINERIZERS=docker,mesos
- MESOS_LOG_DIR=/var/log/mesos
- MESOS_LOGGING_LEVEL=INFO
expose:
- 5051
ports:
- 5051:5051
Then I run "docker-compose up" in each node and they run without any problems.