Search code examples
dockermesosmesospheremarathon

Docker app deployment hangs on Marathon, fails on Mesos


I'm attempting the (potentially foolish) task of Dockerizing Zookeeper/Marathon/Mesos and deploying Docker containers from the Dockerized Mesos cluster.

So far, I have a working Mesos cluster on two physically separate nodes: one node is running both a Mesos master and a slave (container Dockerfiles linked), and the second node is running just a slave. They seem to be functioning just fine; I am able to submit very simple jobs through Marathon (also its own container, running on the node with the master and slave) and they complete successfully.

However, when I attempt to submit Docker containers through the Marathon API, it seems to hang. The Marathon interface hangs at "Deploying" and never changes, even after letting it sit for 15 minutes, stopping, resubmitting, and letting it sit for another 15 minutes.

Marathon UI, depicting seemingly frozen deployment of Docker task

At the same time, tasks are nonetheless being submitted to the Mesos slaves; the Mesos UI is reporting FAILED tasks left and right.

Mesos UI, depicting failed tasks

EDIT 1

The resulting Sandbox logs for each of the executors are also completely empty.

empty sandbox

EDIT 2

Found something interesting buried in the slave logs:

slave logs

Line of interest:

None of the enabled containerizers (mesos) could create a container for the provided TaskInfo/ExecutorInfo message.

It looks like the containerizer is failing to run, and from what I can see, it's not even considering docker as a containerizer. I followed the configuration here to deploy Docker jobs; does this change if the Mesos slaves are themselves Docker containers?

I'm somewhat out of my element and can't find any references along these lines. Any idea what's happening?


Solution

  • What's your docker run command for the slave? Here are a few parameters others have found useful:

    --net host \
    --pid host \
    --privileged \
    --env MESOS_CONTAINERIZERS=docker,mesos \
    --env MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /sys:/sys:ro \
    -v /usr/bin/docker:/usr/bin/docker:ro \
    -v /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro \
    -v /home/core/.dockercfg:/root/.dockercfg:ro \
    

    Also note that you shouldn't name the container mesos-slave as the slave will try to remove any containers prefixed with mesos- upon recovery.

    FYI, Mesos uses the docker --version command to see if the docker containerizer can be used. Try launching a Marathon task that just runs docker --version to see if that would work inside your dockerized slave's environment.