Search code examples
jsonamazon-web-servicesdockerjobsamazon-ecs

CannotStartContainerError while submitting a AWS Batch Job


In AWS Batch I have a job definition and a job queue and a compute environment where to execute my AWS Batch jobs. After submitting a job, I find it in the list of the failed ones with this error:

Status reason
Essential container in task exited
Container message
CannotStartContainerError: API error (404): oci runtime error: container_linux.go:247: starting container process caused "exec: \"/var/application/script.sh --file= --key=. 

and in the cloudwatch logs I have:

container_linux.go:247: starting container process caused "exec: \"/var/application/script.sh --file=Toulouse.json --key=out\": stat /var/application/script.sh --file=Toulouse.json --key=out: no such file or directory"

I have specified a correct docker image that has all the scripts (we use it already and it works) and I don't know where the error is coming from. Any suggestions are very appreciated.

The docker file is something like that:

# Pull base image.
FROM account-id.dkr.ecr.region.amazonaws.com/application-image.base-php7-image:latest

VOLUME /tmp
VOLUME /mount-point

RUN chown -R ubuntu:ubuntu /var/application

# Create the source directories
USER ubuntu
COPY application/ /var/application

# Register aws profile
COPY data/aws /home/ubuntu/.aws

WORKDIR /var/application/
ENV COMPOSER_CACHE_DIR /tmp
RUN composer update -o && \
    rm -Rf /tmp/*

Here is the Job Definition:

{
    "jobDefinitionName": "JobDefinition",
    "jobDefinitionArn": "arn:aws:batch:region:accountid:job-definition/JobDefinition:25",
    "revision": 21,
    "status": "ACTIVE",
    "type": "container",
    "parameters": {},
    "retryStrategy": {
        "attempts": 1
    },
    "containerProperties": {
        "image": "account-id.dkr.ecr.region.amazonaws.com/application-dev:latest",
        "vcpus": 1,
        "memory": 512,
        "command": [
            "/var/application/script.sh",
            "--file=",
            "Ref::file",
            "--key=",
            "Ref::key"
        ],
        "volumes": [
            {
                "host": {
                    "sourcePath": "/mount-point"
                },
                "name": "logs"
            },
            {
                "host": {
                    "sourcePath": "/var/log/php/errors.log"
                },
                "name": "php-errors-log"
            },
            {
                "host": {
                    "sourcePath": "/tmp/"
                },
                "name": "tmp"
            }
        ],
        "environment": [
            {
                "name": "APP_ENV",
                "value": "dev"
            }
        ],
        "mountPoints": [
            {
                "containerPath": "/tmp/",
                "readOnly": false,
                "sourceVolume": "tmp"
            },
            {
                "containerPath": "/var/log/php/errors.log",
                "readOnly": false,
                "sourceVolume": "php-errors-log"
            },
            {
                "containerPath": "/mount-point",
                "readOnly": false,
                "sourceVolume": "logs"
            }
        ],
        "ulimits": []
    }
}

In Cloudwatch log stream /var/log/docker:

time="2017-06-09T12:23:21.014547063Z" level=error msg="Handler for GET /v1.17/containers/4150933a38d4f162ba402a3edd8b7763c6bbbd417fcce232964e4a79c2286f67/json returned error: No such container: 4150933a38d4f162ba402a3edd8b7763c6bbbd417fcce232964e4a79c2286f67" 

Solution

  • This error was because the command was malformed. I was submitting the job by a lambda function (python 2.7) using boto3 and the syntax of the command should be something like this:

    'command' : ['sudo','mkdir','directory']
    

    Hope it helps somebody.