I am trying to run a service using DC/OS and Docker. I created my Stack using the template for my region from here. I also created the following Dockerfile:
FROM ubuntu:16.04
RUN apt-get update && apt-get install -y expect openssh-client
WORKDIR "/root"
ENTRYPOINT eval "$(ssh-agent -s)" && \
mkdir -p .ssh && \
echo $PRIVATE_KEY > .ssh/id_rsa && \
chmod 600 /root/.ssh/id_rsa && \
expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\" send \"\"; interact " && \
while true; do ssh-add -l; sleep 2; done
I have a private repository that I would like to clone/pull from when the docker container starts. This is why I am trying to add the private key to the ssh-agent
.
If I run this image as a docker container locally and supply the private key using the PRIVATE_KEY
environment variable, everything works fine. I see that the identity is added.
The problem that I have is that when I try to run a service on DC/OS using the docker image, the ssh-agent
does not seem to remember the identity that was added using the private key.
I have checked the error log from DC/OS. There are no errors.
Does anyone know why running the docker container on DC/OS is any different compared to running it locally?
EDIT: I have added details of the description of the DC/OS service in case it helps:
{
"id": "/SOME-ID",
"instances": 1,
"cpus": 1,
"mem": 128,
"disk": 0,
"gpus": 0,
"constraints": [],
"fetch": [],
"storeUrls": [],
"backoffSeconds": 1,
"backoffFactor": 1.15,
"maxLaunchDelaySeconds": 3600,
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "IMAGE NAME FROM DOCKERHUB",
"network": "BRIDGE",
"portMappings": [{
"containerPort": SOME PORT NUMBER,
"hostPort": SOME PORT NUMBER,
"servicePort": SERVICE PORT NUMBER,
"protocol": "tcp",
"name": “default”
}],
"privileged": false,
"parameters": [],
"forcePullImage": true
}
},
"healthChecks": [],
"readinessChecks": [],
"dependencies": [],
"upgradeStrategy": {
"minimumHealthCapacity": 1,
"maximumOverCapacity": 1
},
"unreachableStrategy": {
"inactiveAfterSeconds": 300,
"expungeAfterSeconds": 600
},
"killSelection": "YOUNGEST_FIRST",
"requirePorts": true,
"env": {
"PRIVATE_KEY": "ID_RSA PRIVATE_KEY WITH \n LINE BREAKS",
}
}
Check that your local version of Docker matches the version installed on the DC/OS agents. By default, the DC/OS 1.9.3 AWS CloudFormation templates uses CoreOS 1235.12.0, which comes with Docker 1.12.6. It's possible that the entrypoint behavior has changed since then.
Check the Mesos task logs for the Marathon app in question and see what docker run command was executed. You might be passing it slightly different arguments when testing locally.
As mentioned in another answer, the script you provided has several errors that may or may not be related to the failure.
echo $PRIVATE_KEY
should be echo "$PRIVATE_KEY"
to preserve line breaks. Otherwise key decryption will fail with Bad passphrase, try again for /root/.ssh/id_rsa:
.expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\" send \"\"; interact "
should be expect -c "spawn ssh-add /root/.ssh/id_rsa; expect \"Enter passphrase for /root/.ssh/id_rsa:\"; send \"\n\"; interact "
. It's missing a semi-colon and a line break. Otherwise the expect command fails without executing. Enterprise DC/OS 1.10 (1.10.0-rc1 out now) has a new feature named File Based Secrets which allows for injecting files (like id_rsa files) without including their contents in the Marathon app definition, storing them securely in Vault using DC/OS Secrets.
File based secrets wont do the ssh-add for you, but it should make it easier and more secure to get the file into the container.
Mesos 1.2.0 switched to using Docker --env_file instead of -e to pass in environment variables. This triggers a Docker env_file bug that it doesn't support line breaks. A workaround was put into Mesos and DC/OS, but the fix may not be in the minor version you are using.
A manual workaround is to convert the rsa_id to base64 for the Marathon definition and back in your entrypoint script.