Search code examples
dockerapache-zookeepermesos

Core dumped in running Mesos cluster on docker


I have a docker image called ubuntu_mesos_spark. I installed zookeeper on it. I change “zoo.cfg” file like this: This is “zoo.cfg” in node1(150.20.11.157)

tickTime=2000
initLimit=10
syncLimit=5
clientPort=2187
dataDir=/var/lib/zookeeper
server.1=0.0.0.0:2888:3888
server.2=150.20.11.157:2888:3888
server.3=150.20.11.137:2888:3888

This is “zoo.cfg” in node1(150.20.11.134)

tickTime=2000
initLimit=10
syncLimit=5
clientPort=2187
dataDir=/var/lib/zookeeper
server.1=150.20.11.157:2888:3888
server.2=0.0.0.0:2888:3888
server.3=150.20.11.137:2888:3888

This is “zoo.cfg” in node1(150.20.11.137)

 tickTime=2000
 initLimit=10
 syncLimit=5
 clientPort=2187
 dataDir=/var/lib/zookeeper
 server.1=150.20.11.157:2888:3888
 server.2=150.20.11.134:2888:3888
 server.3=0.0.0.0:2888:3888

Also I made a “myid” file in “/var/lib/zookeeper” of each node. For example for “150.20.11.157” its ID is “1” in myid file. I installed Mesos and Spark on the docker too. I have a Mesos cluster of these three nodes too. I defined IP address of slaves nodes on this file: “spark/conf/slaves”

150.20.11.134
150.20.11.137

I added these lines in “spark/conf/spark-env.sh”:

export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=/home/spark/program_file/spark-2.3.2-bin- 
hadoop2.7.tgz

Moreover, I added these lines in my “~/.bashrc” file:

export SPARK_HOME="/home/spark"
PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7- 
src.zip:$PYTHO$
export PYSPARK_HOME=/usr/bin/python3.6
export PYSPARK_DRIVER_PYTHON=python3.6
export ZOO_LOG_DIR=/var/log/zookeeper

I want to run master code in “150.20.11.157”.My docker-compose is :

 version: '3.7'
 services:
  zookeeper:
  image: ubuntu_mesos_spark
  command: /zookeeper-3.4.12/bin/zkServer.sh start
  environment:
   ZOOKEEPER_SERVER_ID: 1
   ZOOKEEPER_CLIENT_PORT: 2187
   ZOOKEEPER_TICK_TIME: 2000
   ZOOKEEPER_INIT_LIMIT: 10
   ZOOKEEPER_SYNC_LIMIT: 5
   ZOOKEEPER_SERVERS: 
   0.0.0.0:2888:3888;150.20.11.134:2888:3888;150.20.11.137:2888:3888
 network_mode: host
 expose:
  - 2187 
  - 2888
  - 3888
 ports:
  - 2187:2187
  - 2888:2888
  - 3888:3888

master:
image: ubuntu_mesos_spark
command: bash -c "sleep 20; /home/mesos-1.7.0/build/bin/mesos- 
master.sh --ip=150.20.11.157 --work_dir=/var/run/mesos"
restart: always
depends_on:
 - zookeeper
environment:
 - MESOS_HOSTNAME="150.20.11.157,150.20.11.134,150.20.11.137"
 - MESOS_QUORUM=1
 - MESOS_LOG_DIR=/var/log/mesos
expose:
 - 5050
 - 4040
 - 7077
 - 8080
ports:
  - 5050:5050
  - 4040:4040
  - 7077:7077
  - 8080:8080

Also, I run this compose file on slaves nodes :“150.20.11.134,150.20.11.137”:

 version: '3.7'
 services:
  zookeeper:
  image: ubuntu_mesos_spark
  command: /zookeeper-3.4.12/bin/zkServer.sh start
  environment:
   ZOOKEEPER_SERVER_ID: 2
   ZOOKEEPER_CLIENT_PORT: 2187
   ZOOKEEPER_TICK_TIME: 2000
   ZOOKEEPER_INIT_LIMIT: 10
   ZOOKEEPER_SYNC_LIMIT: 5
   ZOOKEEPER_SERVERS: 
   0.0.0.0:2888:3888;150.20.11.134:2888:3888;150.20.11.137:2888:3888
 network_mode: host
 expose:
  - 2187 
  - 2888
  - 3888
 ports:
  - 2187:2187
  - 2888:2888
  - 3888:3888

slave:
image: ubuntu_mesos_spark
command: bash -c "/home/mesos-1.7.0/build/bin/mesos-slave.sh -- 
master=150.20.11.157:5050 --work_dir=/var/run/mesos  
--systemd_enable_support=false"
restart: always
privileged: true
network_mode: host
depends_on:
- zookeeper
environment:
 - MESOS_HOSTNAME="150.20.11.157,150.20.11.134,150.20.11.137"
 - MESOS_MASTER=150.20.11.157
 - MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins #also in Dockerfile
 - MESOS_CONTAINERIZERS=docker,mesos
 - MESOS_LOG_DIR=/var/log/mesos
 - MESOS_LOGGING_LEVEL=INFO
expose:
  - 5051
ports:
  - 5051:5051

First I run "sudo docker-compose up" on Master node. Then I run it on slaves nodes. But I get this error:

On Master node, the error is:

Starting marzieh-compose_zookeeper_1 ... done

Recreating marzieh-compose_master_1 ... done

Attaching to marzieh-compose_zookeeper_1, marzieh-compose_master_1

zookeeper_1 | ZooKeeper JMX enabled by default

zookeeper_1 | Using config: /zookeeper-3.4.12/bin/../conf/zoo.cfg

zookeeper_1 | Starting zookeeper ... STARTED

marzieh-compose_zookeeper_1 exited with code 0

master_1 | I0123 11:46:59.585522 7 logging.cpp:201] INFO level logging started!

master_1 | I0123 11:46:59.586066 7 main.cpp:242] Build: 2019-01-21 05:16:39 by master_1 | I0123 11:46:59.586097 7 main.cpp:243] Version: 1.7.0

master_1 | F0123 11:46:59.587368 7 process.cpp:1115] Failed to initialize: Failed to bind on 150.20.11.157:5050: Cannot assign requested address

master_1 | * Check failure stack trace: *

master_1 | @ 0x7f505ce54b9c google::LogMessage::Fail()

master_1 | @ 0x7f505ce54ae0 google::LogMessage::SendToLog()

master_1 | @ 0x7f505ce544b2 google::LogMessage::Flush()

master_1 | @ 0x7f505ce57770
google::LogMessageFatal::~LogMessageFatal()

master_1 | @ 0x7f505cd19ed1 process::initialize()

master_1 | @ 0x55fb7b12981a main

master_1 | @ 0x7f504f0d0830 (unknown)

master_1 | @ 0x55fb7b1288b9 _start

master_1 | bash: line 1: 7 Aborted (core dumped) /home/mesos-1.7.0/build/bin/mesos-master.sh --ip=150.20.11.157 --work_dir=/var/run/mesos

Moreover when I run "sudo docker-compose up" on slave nodes. I got this error:

slave_1 | F0123 11:40:06.878793 1 process.cpp:1115] Failed to initialize: Failed to bind on 0.0.0.0:5051: Address already in use

slave_1 | * Check failure stack trace: *

slave_1 | @ 0x7fee9d319b9c google::LogMessage::Fail()

slave_1 | @ 0x7fee9d319ae0 google::LogMessage::SendToLog()

slave_1 | @ 0x7fee9d3194b2 google::LogMessage::Flush()

slave_1 | @ 0x7fee9d31c770
google::LogMessageFatal::~LogMessageFatal()

slave_1 | @ 0x7fee9d1deed1 process::initialize()

slave_1 | @ 0x55e99f661784 main

slave_1 | @ 0x7fee8f595830 (unknown)

slave_1 | @ 0x55e99f65f139 _start

slave_1 | * Aborted at 1548243606 (unix time) try "date -d @1548243606" if you are using GNU date *

slave_1 | PC: @ 0x7fee8f5ac196 (unknown)

slave_1 | * SIGSEGV (@0x0) received by PID 1 (TID 0x7fee9f9f38c0) from PID 0; stack trace: *

slave_1 | @ 0x7fee8fee8390 (unknown)

slave_1 | @ 0x7fee8f5ac196 (unknown)

slave_1 | @ 0x7fee9d32055b google::DumpStackTraceAndExit()

slave_1 | @ 0x7fee9d319b9c google::LogMessage::Fail()

slave_1 | @ 0x7fee9d319ae0 google::LogMessage::SendToLog()

slave_1 | @ 0x7fee9d3194b2 google::LogMessage::Flush()

slave_1 | @ 0x7fee9d31c770 google::LogMessageFatal::~LogMessageFatal()

slave_1 | @ 0x7fee9d1deed1 process::initialize()

slave_1 | @ 0x55e99f661784 main

slave_1 | @ 0x7fee8f595830 (unknown)

slave_1 | @ 0x55e99f65f139 _start

slave_1 | I0123 11:41:07.818897 1 logging.cpp:201] INFO level logging started!

slave_1 | I0123 11:41:07.819437 1 main.cpp:349] Build: 2019-01-21 05:16:39 by

slave_1 | I0123 11:41:07.819470 1 main.cpp:350] Version: 1.7.0

slave_1 | I0123 11:41:07.823354 1 resolver.cpp:69] Creating default secret resolver

slave_1 | E0123 11:41:07.927773 1 main.cpp:483] EXIT with status 1: Failed to create a containerizer: Could not create DockerContainerizer: Failed to create docker: Failed to get docker version: Failed to execute 'docker -H unix:///var/run/docker.sock -- version': exited with status 127

I searched a lot about that and I could not figure this out. Would you please guide me what the right way is to write docker compose for running Mesos and Spark cluster on docker?

Any help would be appreciated.

Thanks in advance.


Solution

  • Problem solved. I changed docker compose like this and Master and Slaves run without problem:

    "docker-compose.yaml" in Master node is in the following:

    version: '3.7'
    services:
    zookeeper:
     image: ubuntu_mesos_spark_python3.6_client
     command: /home/zookeeper-3.4.12/bin/zkServer.sh start
     environment:
      ZOOKEEPER_SERVER_ID: 1
      ZOOKEEPER_CLIENT_PORT: 2188
      ZOOKEEPER_TICK_TIME: 2000
      ZOOKEEPER_INIT_LIMIT: 10
      ZOOKEEPER_SYNC_LIMIT: 5
      ZOOKEEPER_SERVERS: 0.0.0.0:2888:3888;150.20.11.157:2888:3888
     network_mode: host
     expose:
      - 2188
      - 2888
      - 3888
     ports:
      - 2188:2188
      - 2888:2888
      - 3888:3888
    
    master:
    image: ubuntu_mesos_spark_python3.6_client
    command: bash -c "sleep 30; /home/mesos-1.7.0/build/bin/mesos-master.sh 
    --ip=150.20.10.136 --work_dir=/var/run/mesos --hostname=x.x.x.x"  ##hostname : 
    IP of the master node
    restart: always
    network_mode: host
    depends_on:
     - zookeeper
    environment:
    - MESOS_HOSTNAME="150.20.11.136"
    - MESOS_QUORUM=1
    - MESOS_LOG_DIR=/var/log/mesos
    expose:
     - 5050
     - 4040
     - 7077
     - 8080
    ports:
     - 5050:5050
     - 4040:4040
     - 7077:7077
     - 8080:8080
    

    Also,"docker-compose.yaml" file in slave node is like this:

     version: '3.7'
     services:
      zookeeper:
       image: ubuntu_mesos_spark_python3.6_client
       command: /home/zookeeper-3.4.12/bin/zkServer.sh start
       environment:
         ZOOKEEPER_SERVER_ID: 2
         ZOOKEEPER_CLIENT_PORT: 2188
         ZOOKEEPER_TICK_TIME: 2000
         ZOOKEEPER_INIT_LIMIT: 10
         ZOOKEEPER_SYNC_LIMIT: 5
         ZOOKEEPER_SERVERS: 150.20.11.136:2888:3888;0.0.0.0:2888:3888
       network_mode: host
       expose:
       - 2188 
       - 2888
       - 3888
       ports:
       - 2188:2188
       - 2888:2888
       - 3888:3888
    
     slave:
     image: ubuntu_mesos_spark_python3.6_client
     command: bash -c "sleep 30; /home/mesos-1.7.0/build/bin/mesos-slave.sh 
     --master=150.20.11.136:5050 --work_dir=/var/run/mesos  
     --systemd_enable_support=false"
     restart: always
     privileged: true
     network_mode: host
     depends_on:
     - zookeeper
     environment:
     - MESOS_HOSTNAME="150.20.11.157"
     #- MESOS_MASTER=172.28.10.136
     #- MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins #also in Dockerfile
     #- MESOS_CONTAINERIZERS=docker,mesos
     - MESOS_LOG_DIR=/var/log/mesos
     - MESOS_LOGGING_LEVEL=INFO
    expose:
     - 5051
    ports:
     - 5051:5051
    

    Then I run "docker-compose up" in each node and they run without any problems.