I'm playing on localhost with a DC/OS installation. While everything works fine, I can't seem to run a docker image located inside a private repo. I'm using python to communicate with chronos:
@celery.task(name='add-job', soft_time_limit=5)
def add_job(job_id):
job_document = mongo.jobs.find_one({
'_id': job_id
})
if job_document:
worker_document = mongo.workers.find_one({
'_id': job_document['workerId']
})
if worker_document:
job = {
'async': True,
'name': job_document['_id'],
'owner': 'owner@gmail.com',
'command': "python /code/run.py",
"disabled": False,
"shell": True,
"cpus": worker_document['cpus'],
"disk": worker_document['disk'],
"mem": worker_document['memory'],
'schedule': 'R1//PT300S',# start now,
"epsilon": "PT60M",
"container": {
"type": "DOCKER",
"forcePullImage": True,
"image": "quay.io/username/container",
"network": "HOST",
"volumes": [{
"containerPath": "/images/",
"hostPath": "/images/",
"mode": "RW"
}]
},
"uris": [
"file:///images/docker.tar.gz"
]
}
return chronos_client.add(job)
else:
return 'worker not found'
else:
return 'job not found'
The job runs fine with a public image (alpine:latest
) but it fails without any error inside the dcos installation.
The job gets executed but it fails immediately. The error log of the job inside chronos looks like this:
I1212 12:39:11.141639 25058 fetcher.cpp:498] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/61d6d037-c9f5-482b-a441-11d85554461b-S1\/root","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":false,"value":"file:\/\/\/images\/docker.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slave\/slaves\/61d6d037-c9f5-482b-a441-11d85554461b-S1\/docker\/links\/7029bbea-4c3d-439a-8720-411f6fe40eb9","user":"root"}
I1212 12:39:11.143575 25058 fetcher.cpp:409] Fetching URI 'file:///images/docker.tar.gz'
I1212 12:39:11.143587 25058 fetcher.cpp:250] Fetching directly into the sandbox directory
I1212 12:39:11.143602 25058 fetcher.cpp:187] Fetching URI 'file:///images/docker.tar.gz'
I1212 12:39:11.143612 25058 fetcher.cpp:167] Copying resource with command:cp '/images/docker.tar.gz' '/var/lib/mesos/slave/slaves/61d6d037-c9f5-482b-a441-11d85554461b-S1/docker/links/7029bbea-4c3d-439a-8720-411f6fe40eb9/docker.tar.gz'
I1212 12:39:11.146726 25058 fetcher.cpp:547] Fetched 'file:///images/docker.tar.gz' to '/var/lib/mesos/slave/slaves/61d6d037-c9f5-482b-a441-11d85554461b-S1/docker/links/7029bbea-4c3d-439a-8720-411f6fe40eb9/docker.tar.gz'
Stdout is empty. Executed directly inside marathon as an application with the same settings the authentication works and my image is downloaded & executed. Is this something that chronos does not support? It should...I mean, it has commands for docker...
Update: digging deeper into the agent logs I found this:
Failed to run 'docker -H unix:///var/run/docker.sock pull quay.io/username/container': exited with status 1; stderr='Error: Status 403 trying to pull repository username/container: "{\"error\": \"Permission Denied\"}"
I tried the archive with it's config.json file on the agent itself and it can download when triggered from the command line. I just can't seem to understand why chronos is not using it properly. I can't find any other reference on how to put my credentials other than this.
As it turns out...the uris param is deprecated in favor of fetch. I started from scratch with a marathon config applied to chronos and watched the logs carefully when I saw this: {'message': 'Tried to add both uri (deprecated) and fetch parameters on aBPepwhG5z33e4teG', 'status': 'Bad Request'}
. Then I changed my uris parameter into:
"fetch": [{
"uri": "/images/docker.tar.gz",
"extract": true,
"executable": false,
"cache": false
}]
...and it worked.