I have a 1/1 master/slave setup with the slave having 8gb ram 8 cpus. I am trying to use marathon to deploy a docker container with 1gb mem and 1 cpu but it just hangs on waiting
I believe this is usually caused by marathon not getting the resources it wants for the task when I look at my logs I see
Sending 1 offers to framework 8bb1a298-cc23-426e-ad43-d440a2a560c4-0000 (marathon) at scheduler-d4a993b4-69ea-4ac3-9e98-b54afe1e790b@127.0.0.1:52016 I0127 23:07:37.396546 2471 master.cpp:3297] Processing DECLINE call for offers: [ 5271fcb3-4d77-4b12-af85-d94fd9172514-O127 ] for framework 8bb1a298-cc23-426e-ad43-d440a2a560c4-0000 (marathon) at scheduler-d4a993b4-69ea-4ac3-9e98-b54afe1e790b@127.0.0.1:52016 I0127 23:07:37.396917 2466 hierarchical.cpp:744] Recovered cpus():6; mem():5968; disk():156020; ports():[31000-31056, 31058-32000] (total: cpus():8; mem():6992; disk():156020; ports():[31000-32000], allocated: cpus():2; mem():1024; ports(*):[31057-31057]) on slave 8bb1a298-cc23-426e-ad43-d440a2a560c4-S0 from framework 8bb1a298-cc23-426e-ad43-d440a2a560c4-0000
so it looks like marathon is declining the offer it gets? the next line in the logs say that mesos is reclaiming the offered resources and what its reclaiming looks like its plenty for my task?
any ideas on how to trouble shoot this further?
edit: so got to dig into this a bit further and found the marathon logs.
Basically the deployment works if we do not enter any information for port mapping in the marathon docker section. The docker container deploys successfully and I can ping it successfully from its host but I cannot access it from elsewhere.
if we set the container port as 8081 (which is what the docker container exposes are its application listens on) we get further in the deployment process but the app within the container fails to build with error
Error: listen EADDRINUSE :::8081 at Object.exports._errnoException (util.js:856:11) at exports._exceptionWithHostPort (util.js:879:20) at Server._listen2 (net.js:1234:14) at listen (net.js:1270:10) at Server.listen (net.js:1366:5) at EventEmitter.listen (/usr/src/app/node_modules/express/lib/application.js:617:24) at Object. (/usr/src/app/index.js:16:18) at Module._compile (module.js:425:26) at Object.Module._extensions..js (module.js:432:10) at Module.load (module.js:356:32) at Function.Module._load (module.js:313:12) at Function.Module.runMain (module.js:457:10) at startup (node.js:138:18) at node.js:974:3
So I think we are further along than we were but we are still having some port issues. I dont know why the container would build successfully on its own and with marathon with no port settings but not with marathon with port settings
There are few things to check:
On you slave: ps aux | grep sbin/mesos-slave
should contain something like:
--containerizers=docker,mesos --executor_registration_timeout=5mins
Again on slave check that there's a Docker Daemon running:
ps aux | grep "docker daemon"
Make sure you've configured Docker network (in Marathon) as BRIDGE
. With HOST
mode you might get in collision with ports already used on host. This will allow mapping slave:32001 -> docker:8080
.
...
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 8080,
"hostPort": $PORT0,
"protocol": "tcp"
}
],
...
When the task starts in Marathon you'll see the app ID like myapp.a72db5b0-ca16-11e5-ba5f-fea9945fabaf
. Use Mesos CLI (pip install mesos.cli mesos.interface
) to fetch the logs. There's a command similar to Unix's tail
for fetching stdout
logs (-f
follow logs):
mesos tail -f -i myapp.a72db5b0-ca16-11e5-ba5f-fea9945fabaf
and stderr
:
mesos tail -f -i myapp.a72db5b0-ca16-11e5-ba5f-fea9945fabaf stderr
-i
allows you to get logs from inactive tasks (in case that the task is crashing quickly). If you don't catch the ID in Marathon, use mesos ps -i
.
In case that the task is not starting, there's either not enough resources or some problem with Marathon. Navigate your browser to http://{marathon URI:8080]/logging
and increase verbosity for task allocation. Then check Marathon logs.