Search code examples
amazon-web-servicesdockeramazon-ecsaws-ecs

ECS container gets killed every ~1 hour


UPDATE: Turns out it was running out of disk space


My ECS container keeps getting killed approximately 1 hour after it launches. It’s happening around 55 to 65 minutes. Then a new container is created which then gets killed after about an hour. I’ve looked at the logs of the ec2 host as well as within the container and nothing shows what’s going on.

Any idea what I can do?

# docker ps -a
CONTAINER ID        IMAGE                                                         COMMAND              CREATED             STATUS                         PORTS                                                                        NAMES
90d9xyze57fb        xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest   "/usr/bin/rungeth"   21 minutes ago      Up 21 minutes                  0.0.0.0:8545->8545/tcp, 0.0.0.0:30303->30303/tcp, 0.0.0.0:30303->30303/udp   ecs-geth-task-1-geth-container-f29d85fxyze7c9a5d201
4603xyz723d3        xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest   "/usr/bin/rungeth"   About an hour ago   Exited (1) 22 minutes ago                                                                                   ecs-geth-task-1-geth-container-cec7cd8xyze3f88fe901
9f38xyzc032a        xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest   "/usr/bin/rungeth"   2 hours ago         Exited (1) About an hour ago                                                                                ecs-geth-task-1-geth-container-eecfe8cxyz88f8b0ff01
3c33xyza6054        xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest   "/usr/bin/rungeth"   2 hours ago         Exited (1) 2 hours ago                                                                                      ecs-geth-task-1-geth-container-ccc08ddxyzb495d9e001
7a20xyzff29e        xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest   "/usr/bin/rungeth"   3 hours ago         Exited (1) 2 hours ago                                                                                      ecs-geth-task-1-geth-container-8c96e1exyz8aff821d00
75bdxyzc00e7        xyz123.dkr.ecr.us-east-2.amazonaws.com/geth:latest   "/usr/bin/rungeth"   4 hours ago         Exited (1) 3 hours ago                                                                                      ecs-geth-task-1-geth-container-e0aec48xyzf58bfcf101
1b3bxyz1961f        amazon/amazon-ecs-agent:latest                                "/agent"             4 hours ago         Up 4 hours                                                                                                  ecs-agent


# docker logs 4603xyz723d3
#

Solution

  • Turns out it was running out of disk space.

    Attaching a larger volume & Setting the Launch Configuration's User Data to:

    #cloud-boothook 
    cloud-init-per once ecs_config echo 'ECS_CLUSTER=my-cluster' >> /etc/ecs/ecs.config
    cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --storage-opt dm.basesize=200G"' >> /etc/sysconfig/docker
    

    Fixed the issue.