Search code examples
dockeramazon-ec2amazon-ecs

Start EC2 with Docker, run script and shut down


Hi Stackoverflow community, I have a question regarding using Docker with AWS EC2. I am comfortable with EC2 but am very new to Docker. I code in Python 3.6 and would like to automate the following process:

1: start an EC2 instance with Docker (Docker image stored in ECR)

2: run a one-off process and return results (let's call it "T") in a CSV format

3: store "T" in AWS S3

4: Shut down the EC2

The reason for using an EC2 instance is because the process is quite computationally intensive and is not feasible for my local computer. The reason for Docker is to ensure the development environment is the same across the team and the CI facility (currently using circle.ci). I understand that interactions with AWS can mostly be done using Boto3.

I have been reading about AWS's own ECS and I have a feeling that it's geared more towards deploying a web-app with Docker rather than running a one-off process. However, when I searched around EC2 + Docker nothing else but ECS came up. I have also done the tutorial in AWS but it doesn't help much.

I have also considered running EC2 with a shell script (i.e. downloading docker, pulling the image, building the container etc)but it feels a bit hacky? Therefore my questions here are:

1: Is ECS really the most appropriate solution in his scenario? (or in other words is ECS designed for such operations?)

2: If so are there any examples of people setting-up and running a one-off process using ECS? (I find the set-up really confusing especially the terminologies used)

3: What are the other alternatives (if any)?

Thank you so much for the help!


Solution

  • Without knowing more about your process; I'd like to pose 2 alternatives for you.

    1. Use Lambda

    Pending just how compute intensive your process is, this may not be a viable option. However, if it something that can be distributed, Lambda is awesome. You can find more information about the resource limitations here. This route, you would simply write Python 3.6 code to perform your task and write "T" to S3.

    1. Use Data Pipeline

    With Data Pipeline, you can build a custom AMI (EC2) and use that as your image. You can then specify the size of the EC2 resource that you need to run this process. It sounds like your process would be pretty simple. You would need to define:

    • EC2resource
      • Specify AMI, Role, Security Group, Instance Type, etc.
    • ShellActivity
      • Bootstrap the EC2 instance as needed
      • Grab your code form S3, GitHub, etc
      • Execute your code (Include in your code writing "T" to S3)

    You can also schedule the pipeline to run at an interval/schedule or call it directly from boto3.