amazon-web-services amazon-ec2 docker salt-project

Combining salt, docker and Amazon EC2 for hosting Python application

The situation we currently have in our company is:

3 python applications that can be spawned as many times as needed
Single Amazon EC2 server that is hosting all those mentioned apps( 1 instance of each)
CPU utilization ~30%
periodic work we want to have done within 1hr takes 2hr ( single instance of our app can't work faster - reasons are not important here - but spawning second instance does the trick)

We want to find out auto-scalable solution using docker, salt and Amazon EC2. Since I don't have admin background It's hard to evaluate which of possible solutions we've came up with are good and which are bad. So I decided to ask you for your experience with mentioned technologies and maybe you will be able to point out possible problems with following solutions:

We have salt that is taking care of single ec2 server. It's installing all app dependencies and creating AMI image with the newest app version. Then we are using Amazon auto-scaling services to spawn new AMI when needed.
- pros:
  - It's simple
  - It's flexible
  - Handles hardware failures pretty well
- cons:
  - It's not cost effective
  - We are not using all resources
We have fixed amount of apps(wrapped by docker container) deployed on EC2 server instance e.g. we are running always 3x Application A on a server L4.medium. When we need to have more app instances Amazon auto-scaling is spawning new EC2 server and salt is taking care of that we will have 3 docker containers with app A running there.
- pros:
  - We can use any EC2 server we want
  - We can use all available resources on a particular server
- cons:
  - Granularity of scaling: If four app A instances are doing their job in 1h 20 mins and our target is 1 hr we are spawning next 4 instances and then job is done in 40 mins (What is unnecessarily fast).
We have any server we want and scaling means adding either new ec2 instance or new docker container to existing ec2 instance. So in other words we are adding new docker containers to existing machine unless amazon auto-scaling adds new ec2 instance. That is theoretically the best solution we've found but the problem with it is I don't know is it even possible do achieve with salt.
- pros:
  - Flexible
  - cost effective
  - pretty cool:)
- cons:
  - The most complex solution
  - Problems with scaling down ( we had 6 app A instances on 3 servers and now we need just 2 so we are removing 4 instances from 2 servers but there could be Different app there which prevent us from stopping ec2 so we have unused resources again)
  - I don't even know where to start with it

That's all we have:) Any suggestions will be appreciated. Any different solutions than those three are very welcome (Especially those already running somewhere on production).

Solution

Is there a reason you do not consider using the amazon container service (ecs, http://aws.amazon.com/ecs/)? I imagine this would cover the docker scenarios you describe and perhaps you would not even need salt then (maybe this is just my ignorance, not having worked with it). You could containerize your app any way you want, create a docker cluster based on the ecs AMI and have amazon do the the scheduling in your docker cluster or you monitor the resources yourself via the API adding new cluster nodes when needed. From the ecs FAQ (http://aws.amazon.com/ecs/faqs/):

Is it possible for me to schedule container launches and manage placement across a cluster? Yes. You can do this in two ways. You can choose to let EC2 Container Service randomly place you across a cluster to try and maximize the availability of your tasks using the RunTask API, or you can use the DescribeCluster API to get information about the complete state of your cluster. The API returns data on all the container instances in a cluster, what tasks they're running, and what resources are still available. With this information you can use the StartTask API to target specific Container Instances in your cluster or use a custom scheduler to manage placement based on your requirements.

I think that way you could first utilize all your available cluster nodes to a high percentage and then trigger the creation of new cluster nodes that can be removed when you re done with the calculation. This should address the granularity problem form scenario 2 and the scaling down problem from scenario 3. In terms of complexity, it is still pretty high, at least compared to scenario 1 since you need to dockerize everything and learn ecs.