Search code examples
amazon-web-servicesarchitectureamazon-ec2fault-tolerance

Single fault tolerant machine with amazon AWS


For a particular service, I need to run a single EC2 instance in a fault tolerant way.

Only in case of errors I want that the "primary" machine is terminated and the traffic must be be redirected on "secondary" machine within some seconds and automatically. This is the classic case of a primary and secondary server with the constraint that the secondary server must not work unless the primary crashed.

I'm quite new in this world but as far as I understood, with Elastic IP I need to manually change the binding if the primary machine hangs. Instead, with Auto Scaling, ELB and CloudWatch I can:

  • Set up an auto scaling park with 2 machine, but the traffic will be load balanced (sticky sessions is not what I want because I need all the traffic on the primary machine if it works)
  • Set up an auto scaling with just 1 machine, so if the primary machine hangs automatically a new one will be online. However as far as I know the boot process needs several minutes.

Any advice on how I can combine AWS services to achieve this goal?


Solution

  • There are automation options you could develop with the EC2 API, but you would need an always online machine to do it.

    The preferred scenario within ec2 is to have a load balancer send traffic between two machines using a shared nothing architecture (This means persistent data would be on s3, or database that is not on the instance).

    If your application does not allow for this, you can set up a backup instance which will health check your primary instance. Using a custom script, if the health check rules fail, you would remap and elastic IP address to the backup instance and then terminate and relaunch your primary instance. Once the health check works again, you could automatically return the ip to the primary instance. This would probably be easier to set up in VPC rather than classic as you would have control of private IP addresses.